R/filter_individuals.R
filter_individuals.Rd
Remove individuals with bad QC based on:
missingness (genotyping rate)
heterozygosity
coverage (total, median, iqr)
Filter targets: Individuals
Statistics: Missingness, heterozygosity and coverage
Used internally in radiator and might be of interest for users who wants to blacklist individuals.
filter_individuals(
data,
interactive.filter = TRUE,
filter.individuals.missing = NULL,
filter.individuals.heterozygosity = NULL,
filter.individuals.coverage.total = NULL,
filter.individuals.coverage.median = NULL,
filter.individuals.coverage.iqr = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
(2 options) A Genomic Data Structure (GDS) file or object generated by radiator.
(optional, logical) Do you want the filtering session to
be interactive. Figures of distribution are shown before asking for filtering
thresholds.
Default: interactive.filter = TRUE
.
(optional, double) A proportion above which the individuals are
blacklisted and removed from the dataset.
Default: filter.individuals.missing = NULL
.
(optional, string of doubles) A proportion below and
above which the individuals are blacklisted and removed from the dataset.
Default: filter.individuals.heterozygosity = NULL
.
(optional, string of doubles)
Target the total coverage per samples.
A proportion below and
above which the individuals are blacklisted and removed from the dataset.
Default: filter.individuals.coverage.total = NULL
.
(optional, string of integers)
Target the median coverage per samples.
Integers, below and above, that blacklist individuals (removed from the dataset)
Default: filter.individuals.coverage.median = NULL
.
(optional, string of integers)
Target the IQR (Interquartile Range) coverage per samples.
Integers, below and above, that blacklist individuals (removed from the dataset)
Default: filter.individuals.coverage.iqr = NULL
.
(optional) The number of core used for parallel
execution during import.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
A list with the filtered input and blacklist of individuals.
if (FALSE) { # \dontrun{
require(SeqArray)
# blacklisting outliers individuals:
id.qc <- radiator::filter_individuals(
data = "my.radiator.gds.rad",
filter.individuals.missing = "outliers",
filter.individuals.heterozygosity = "outliers",
filter.individuals.coverage.total = "outliers")
# using values to blacklist individuals:
id.qc <- radiator::filter_individuals(
data = "my.radiator.gds.rad",
filter.individuals.missing = 0.5,
filter.individuals.heterozygosity = c(0.02, 0.03),
filter.individuals.coverage.total = c(900000, 5000000))
} # }