This filter removes outlier markers with too many SNP number per locus/read. The data requires snp and locus information (e.g. from a VCF file). Having a higher than "normal" SNP number is usually the results of assembly artifacts or bad assembly parameters. This filter is population-agnostic, but still requires a strata file if a vcf file is used as input.
Filter targets: Markers
Statistics: The number of SNPs per locus.
filter_snp_number(
data,
strata = NULL,
interactive.filter = TRUE,
filter.snp.number = NULL,
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
(4 options) A file or object generated by radiator:
tidy data
Genomic Data Structure (GDS)
How to get GDS and tidy data ?
Look into tidy_genomic_data
,
read_vcf
or
tidy_vcf
.
(path or object) The strata file or object.
Additional documentation is available in read_strata
.
Use that function to whitelist/blacklist populations/individuals.
Option to set pop.levels/pop.labels
is also available.
(optional, logical) Do you want the filtering session to
be interactive. Figures of distribution are shown before asking for filtering
thresholds.
Default: interactive.filter = TRUE
.
(integer) This is best decided after viewing the figures.
If the argument is set to 2, locus with 3 and more SNPs will be blacklisted.
Default: filter.snp.number = NULL
.
(optional) Name of the filtered tidy data frame file
written to the working directory (ending with .tsv
)
Default: filename = NULL
.
(optional) The number of core used for parallel
execution during import.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
A list in the global environment with 6 objects:
$snp.number.markers
$number.snp.reads.plot
$whitelist.markers
$tidy.filtered.snp.number
$blacklist.markers
$filters.parameters
The object can be isolated in separate object outside the list by following the example below.
Interactive version
There are 2 steps in the interactive version to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. SNP number per read/locus visualization
Step 2. Choose the filtering thresholds
if (FALSE) { # \dontrun{
turtle.outlier.snp.number <- radiator::filter_snp_number(
data = "turtle.vcf",
strata = "turtle.strata.tsv",
max.snp.number = 4,
filename = "tidy.data.turtle.tsv"
)
tidy.data <- turtle.outlier.snp.number$tidy.filtered.snp.number
#Inside the same list, to isolate the markers blacklisted:
blacklist <- turtle.outlier.snp.number$blacklist.markers
} # }