This function is designed to remove/blacklist markers based on genotyping/call rate.
Filter targets: SNPs
Statistics: mean genotyping/call rate (missingness information).
filter_genotyping(
data,
interactive.filter = TRUE,
filter.genotyping = NULL,
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
(4 options) A file or object generated by radiator:
tidy data
Genomic Data Structure (GDS)
How to get GDS and tidy data ?
Look into tidy_genomic_data
,
read_vcf
or
tidy_vcf
.
(optional, logical) Do you want the filtering session to
be interactive. With default: interactive.filter == TRUE
, figures and
tables are shown before making decisions for filtering.
(optional, string) 2 options:
character string filter.genotyping = "outliers"
will use as
thresholds the higher outlier values in the box plot.
double filter.genotyping = 0.2
. Will allow up to 0.2 missing genotypes.
Default: filter.genotyping = NULL
.
(optional, character)
Default: filename = NULL
.
(optional) The number of core used for parallel
execution during import.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
With interactive.filter = FALSE
, a list in the global environment,
with 7 objects:
$tidy.filtered.mac
$whitelist.markers
$blacklist.markers
$mac.data
$filters.parameters
With interactive.filter = TRUE
, a list with 4 additionnal objects are generated.
$distribution.mac.global
$distribution.mac.local
$mac.global.summary
$mac.helper.table
dots-dots-dots ... allows to pass several arguments for fine-tuning the function:
filter.common.markers
(optional, logical).
Default: filter.common.markers = FALSE
,
Documented in filter_common_markers
.
filter.monomorphic
(logical, optional) Should the monomorphic
markers present in the dataset be filtered out ?
Default: filter.monomorphic = TRUE
.
Documented in filter_monomorphic
.
path.folder
: to write ouput in a specific path
(used internally in radiator).
Default: path.folder = getwd()
.
If the supplied directory doesn't exist, it's created.
To help choose a threshold use the interactive version.
2 steps in the interactive version:
Step 1. Visualization and helper table.
Step 2. Filtering markers based on mean genotyping/missing rate
if (FALSE) { # \dontrun{
# The minumum
} # }