This filter removes markers/SNPs based on their position on the read. The data requires snp, locus and col information (e.g. from a VCF file).

The impact of assembly artifacts can be tested in downstream analysis with the whitelist and blacklist generated by this function.

Filter targets: Markers

Statistics: The position of the SNPs on the read.

filter_snp_position_read(
  data,
  strata = NULL,
  interactive.filter = TRUE,
  filter.snp.position.read = NULL,
  filename = NULL,
  parallel.core = parallel::detectCores() - 1,
  verbose = TRUE,
  ...
)

Arguments

data

(4 options) A file or object generated by radiator:

  • tidy data

  • Genomic Data Structure (GDS)

How to get GDS and tidy data ? Look into tidy_genomic_data, read_vcf or tidy_vcf.

strata

(path or object) The strata file or object. Additional documentation is available in read_strata. Use that function to whitelist/blacklist populations/individuals. Option to set pop.levels/pop.labels is also available.

interactive.filter

(optional, logical) Do you want the filtering session to be interactive. Figures of distribution are shown before asking for filtering thresholds. Default: interactive.filter = TRUE.

filter.snp.position.read

(character) Options are: "outliers", "q75", "iqr", c(min value,max value). For a safe and conservative value, use "outliers", this will remove SNPs with outlier position on the reads. Default: filter.snp.read.position = NULL.

filename

(optional) Name of the filtered tidy data frame file written to the working directory (ending with .tsv) Default: filename = NULL.

parallel.core

(optional) The number of core used for parallel execution during import. Default: parallel.core = parallel::detectCores() - 1.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Value

A list in the global environment with 6 objects:

  1. $snp.number.markers

  2. $number.snp.reads.plot

  3. $whitelist.markers

  4. $tidy.filtered.snp.number

  5. $blacklist.markers

  6. $filters.parameters

The object can be isolated in separate object outside the list by following the example below.

Details

Interactive version

There are 2 steps in the interactive version to visualize and filter the data based on the number of SNP on the read/locus:

Step 1. SNP number per read/locus visualization

Step 2. Choose the filtering thresholds

Examples

if (FALSE) {
turtle <- radiator::filter_snp_position_read(
data = "turtle.vcf",
strata = "turtle.strata.tsv",
filter.snp.position.read = "outliers",
filename = "tidy.data.turtle.tsv"
)
}