This filter removes markers below a certain threshold. Based on the repoducibility column found in DArT files.

Filter targets: Markers

Statistics: Reproducibility (established by DArT)

filter_dart_reproducibility(
  data,
  interactive.filter = TRUE,
  filter.reproducibility = NULL,
  parallel.core = parallel::detectCores() - 1,
  verbose = TRUE,
  ...
)

Arguments

data

(4 options) A file or object generated by radiator:

  • tidy data

  • Genomic Data Structure (GDS)

How to get GDS and tidy data ? Look into tidy_genomic_data, read_vcf or tidy_vcf.

interactive.filter

(optional, logical) Do you want the filtering session to be interactive. Figures of distribution are shown before asking for filtering thresholds. Default: interactive.filter = TRUE.

filter.reproducibility

(double, character) This is best decided after viewing the figures. Usually values higher than 0.95 are not uncommon. The value can also be character: filter.reproducibility = "outliers". Using this, will remove outlier markers using the lower outlier statistics. Default: filter.reproducibility = NULL.

parallel.core

(optional) The number of core used for parallel execution during import. Default: parallel.core = parallel::detectCores() - 1.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Value

A list in the global environment with 6 objects:

  1. $whitelist.markers

  2. $blacklist.markers

  3. $filters.parameters

The object can be isolated in separate object outside the list by following the example below.

Details

Interactive version

There are 2 steps in the interactive version to visualize and filter the data based on the reproducibility value:

Step 1. Visualization using a box plot

Step 2. Choose the filtering threshold

Examples

if (FALSE) {
spotted.cod <- radiator::read_dart(
    data = "Combined_1514and1614_SNP_80Callrate.csv",
    strata = "strata.dart.spotted.cod.tsv"
)
turtle.filtered <- radiator::filter_dart_reproducibility(
    data = spotted.cod,
    filter.reproducibility = 0.97
)
}