R/filter_dart_reproducibility.R
filter_dart_reproducibility.Rd
This filter removes markers below a certain threshold. Based on the repoducibility column found in DArT files.
Filter targets: Markers
Statistics: Reproducibility (established by DArT)
filter_dart_reproducibility(
data,
interactive.filter = TRUE,
filter.reproducibility = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
(4 options) A file or object generated by radiator:
tidy data
Genomic Data Structure (GDS)
How to get GDS and tidy data ?
Look into tidy_genomic_data
,
read_vcf
or
tidy_vcf
.
(optional, logical) Do you want the filtering session to
be interactive. Figures of distribution are shown before asking for filtering
thresholds.
Default: interactive.filter = TRUE
.
(double, character) This is best decided after viewing the figures.
Usually values higher than 0.95 are not uncommon.
The value can also be character: filter.reproducibility = "outliers"
.
Using this, will remove outlier markers using the lower outlier statistics.
Default: filter.reproducibility = NULL
.
(optional) The number of core used for parallel
execution during import.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
A list in the global environment with 6 objects:
$whitelist.markers
$blacklist.markers
$filters.parameters
The object can be isolated in separate object outside the list by following the example below.
Interactive version
There are 2 steps in the interactive version to visualize and filter the data based on the reproducibility value:
Step 1. Visualization using a box plot
Step 2. Choose the filtering threshold
if (FALSE) { # \dontrun{
spotted.cod <- radiator::read_dart(
data = "Combined_1514and1614_SNP_80Callrate.csv",
strata = "strata.dart.spotted.cod.tsv"
)
turtle.filtered <- radiator::filter_dart_reproducibility(
data = spotted.cod,
filter.reproducibility = 0.97
)
} # }