This function is designed to remove/blacklist markers based on genotyping/call rate.

Filter targets: SNPs

Statistics: mean genotyping/call rate (missingness information).

filter_genotyping(
  data,
  interactive.filter = TRUE,
  filter.genotyping = NULL,
  filename = NULL,
  parallel.core = parallel::detectCores() - 1,
  verbose = TRUE,
  ...
)

Arguments

data

(4 options) A file or object generated by radiator:

  • tidy data

  • Genomic Data Structure (GDS)

How to get GDS and tidy data ? Look into tidy_genomic_data, read_vcf or tidy_vcf.

interactive.filter

(optional, logical) Do you want the filtering session to be interactive. With default: interactive.filter == TRUE, figures and tables are shown before making decisions for filtering.

filter.genotyping

(optional, string) 2 options:

  • character string filter.genotyping = "outliers" will use as thresholds the higher outlier values in the box plot.

  • double filter.genotyping = 0.2. Will allow up to 0.2 missing genotypes.

Default: filter.genotyping = NULL.

filename

(optional, character) Default: filename = NULL.

parallel.core

(optional) The number of core used for parallel execution during import. Default: parallel.core = parallel::detectCores() - 1.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Value

With interactive.filter = FALSE, a list in the global environment, with 7 objects:

  1. $tidy.filtered.mac

  2. $whitelist.markers

  3. $blacklist.markers

  4. $mac.data

  5. $filters.parameters

With interactive.filter = TRUE, a list with 4 additionnal objects are generated.

  1. $distribution.mac.global

  2. $distribution.mac.local

  3. $mac.global.summary

  4. $mac.helper.table

Advance mode

dots-dots-dots ... allows to pass several arguments for fine-tuning the function:

  1. filter.common.markers (optional, logical). Default: filter.common.markers = FALSE, Documented in filter_common_markers.

  2. filter.monomorphic (logical, optional) Should the monomorphic markers present in the dataset be filtered out ? Default: filter.monomorphic = TRUE. Documented in filter_monomorphic.

  3. path.folder: to write ouput in a specific path (used internally in radiator). Default: path.folder = getwd(). If the supplied directory doesn't exist, it's created.

Interactive version

To help choose a threshold use the interactive version.

2 steps in the interactive version:

Step 1. Visualization and helper table.

Step 2. Filtering markers based on mean genotyping/missing rate

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) {
# The minumum
}