Filter dataset with blacklist of genotypes — filter_blacklist

Filter dataset with blacklist of genotypes.

This function allows to blacklist/erase/mask genotypes.

Used internally in radiator and might be of interest for users.

filter_blacklist_genotypes(data, blacklist.genotypes, verbose = TRUE, ...)

Arguments

data

(4 options) A file or object generated by radiator:

tidy data
Genomic Data Structure (GDS)

How to get GDS and tidy data ? Look into tidy_genomic_data, read_vcf or tidy_vcf.

blacklist.genotypes

(path or object) The blacklist is an object in your global environment or a file in the working directory (e.g. "blacklist.geno.tsv"). The dataframe contains at least these 2 columns: MARKERS, INDIVIDUALS. Additional columns are allowed: CHROM, LOCUS, POS.

Useful to erase genotypes with bad QC, e.g. genotype with more than 2 alleles in diploid likely sequencing errors or genotypes with poor genotype likelihood or coverage.

Columns are cleaned of separators that interfere with some packages or codes, detailed in clean_markers_names and clean_ind_names Default blacklist.genotypes = NULL.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Life cycle

This function arguments will be subject to changes. Currently the function uses erase.genotypes, but using the dots-dots-dots ... arguments allows to pass erase.genotypes and masked.genotypes. These arguments do exactly the same thing and only one can be used.

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) {
data <- radiator::filter_blacklist_genotypes(
    data = data, blacklist.geno = "blacklist.geno.tsv"
    )
}