R/filter_blacklist_genotypes.R
filter_blacklist_genotypes.Rd
Filter dataset with blacklist of genotypes.
This function allows to blacklist/erase/mask genotypes.
Used internally in radiator and might be of interest for users.
filter_blacklist_genotypes(data, blacklist.genotypes, verbose = TRUE, ...)
(4 options) A file or object generated by radiator:
tidy data
Genomic Data Structure (GDS)
How to get GDS and tidy data ?
Look into tidy_genomic_data
,
read_vcf
or
tidy_vcf
.
(path or object)
The blacklist is an object in your
global environment or a file in the working directory (e.g. "blacklist.geno.tsv").
The dataframe contains at least these 2 columns: MARKERS, INDIVIDUALS
.
Additional columns are allowed: CHROM, LOCUS, POS
.
Useful to erase genotypes with bad QC, e.g. genotype with more than 2 alleles in diploid likely sequencing errors or genotypes with poor genotype likelihood or coverage.
Columns are cleaned of separators that interfere with some packages or codes, detailed in
clean_markers_names
and clean_ind_names
Default blacklist.genotypes = NULL
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
This function arguments will be subject to changes. Currently the function uses
erase.genotypes, but using the dots-dots-dots ...
arguments allows to
pass erase.genotypes and masked.genotypes
. These arguments do exactly
the same thing and only one can be used.
if (FALSE) {
data <- radiator::filter_blacklist_genotypes(
data = data, blacklist.geno = "blacklist.geno.tsv"
)
}