R/filter_snp_position_read.R
filter_snp_position_read.Rd
This filter removes markers/SNPs based on their position on the read. The data requires snp, locus and col information (e.g. from a VCF file).
The impact of assembly artifacts can be tested in downstream analysis with the whitelist and blacklist generated by this function.
Filter targets: Markers
Statistics: The position of the SNPs on the read.
filter_snp_position_read(
data,
strata = NULL,
interactive.filter = TRUE,
filter.snp.position.read = NULL,
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
(4 options) A file or object generated by radiator:
tidy data
Genomic Data Structure (GDS)
How to get GDS and tidy data ?
Look into tidy_genomic_data
,
read_vcf
or
tidy_vcf
.
(path or object) The strata file or object.
Additional documentation is available in read_strata
.
Use that function to whitelist/blacklist populations/individuals.
Option to set pop.levels/pop.labels
is also available.
(optional, logical) Do you want the filtering session to
be interactive. Figures of distribution are shown before asking for filtering
thresholds.
Default: interactive.filter = TRUE
.
(character)
Options are: "outliers", "q75", "iqr", c(min value,max value)
.
For a safe and conservative
value, use "outliers"
, this will remove SNPs with outlier position on
the reads.
Default: filter.snp.read.position = NULL
.
(optional) Name of the filtered tidy data frame file
written to the working directory (ending with .tsv
)
Default: filename = NULL
.
(optional) The number of core used for parallel
execution during import.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
A list in the global environment with 6 objects:
$snp.number.markers
$number.snp.reads.plot
$whitelist.markers
$tidy.filtered.snp.number
$blacklist.markers
$filters.parameters
The object can be isolated in separate object outside the list by following the example below.
Interactive version
There are 2 steps in the interactive version to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. SNP number per read/locus visualization
Step 2. Choose the filtering thresholds
if (FALSE) { # \dontrun{
turtle <- radiator::filter_snp_position_read(
data = "turtle.vcf",
strata = "turtle.strata.tsv",
filter.snp.position.read = "outliers",
filename = "tidy.data.turtle.tsv"
)
} # }