Remove individuals with bad QC based on:

  • missingness (genotyping rate)

  • heterozygosity

  • coverage (total, median, iqr)

Filter targets: Individuals

Statistics: Missingness, heterozygosity and coverage

Used internally in radiator and might be of interest for users who wants to blacklist individuals.

filter_individuals(
  data,
  interactive.filter = TRUE,
  filter.individuals.missing = NULL,
  filter.individuals.heterozygosity = NULL,
  filter.individuals.coverage.total = NULL,
  filter.individuals.coverage.median = NULL,
  filter.individuals.coverage.iqr = NULL,
  parallel.core = parallel::detectCores() - 1,
  verbose = TRUE,
  ...
)

Arguments

data

(2 options) A Genomic Data Structure (GDS) file or object generated by radiator.

How to get GDS? Look into: read_vcf or tidy_vcf.

interactive.filter

(optional, logical) Do you want the filtering session to be interactive. Figures of distribution are shown before asking for filtering thresholds. Default: interactive.filter = TRUE.

filter.individuals.missing

(optional, double) A proportion above which the individuals are blacklisted and removed from the dataset. Default: filter.individuals.missing = NULL.

filter.individuals.heterozygosity

(optional, string of doubles) A proportion below and above which the individuals are blacklisted and removed from the dataset. Default: filter.individuals.heterozygosity = NULL.

filter.individuals.coverage.total

(optional, string of doubles) Target the total coverage per samples. A proportion below and above which the individuals are blacklisted and removed from the dataset. Default: filter.individuals.coverage.total = NULL.

filter.individuals.coverage.median

(optional, string of integers) Target the median coverage per samples. Integers, below and above, that blacklist individuals (removed from the dataset) Default: filter.individuals.coverage.median = NULL.

filter.individuals.coverage.iqr

(optional, string of integers) Target the IQR (Interquartile Range) coverage per samples. Integers, below and above, that blacklist individuals (removed from the dataset) Default: filter.individuals.coverage.iqr = NULL.

parallel.core

(optional) The number of core used for parallel execution during import. Default: parallel.core = parallel::detectCores() - 1.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Value

A list with the filtered input and blacklist of individuals.

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) {
require(SeqArray)

# blacklisting outliers individuals:
id.qc <- radiator::filter_individuals(
    data = "my.radiator.gds.rad",
    filter.individuals.missing = "outliers",
    filter.individuals.heterozygosity = "outliers",
    filter.individuals.coverage.total = "outliers")

# using values to blacklist individuals:
id.qc <- radiator::filter_individuals(
    data = "my.radiator.gds.rad",
    filter.individuals.missing = 0.5,
    filter.individuals.heterozygosity = c(0.02, 0.03),
    filter.individuals.coverage.total = c(900000, 5000000))

}