This function reads the fastq file of an individual and generate a figure of read coverage groups.
read_depth_plot( fq.file, min.coverage.fig = 7L, parallel.core = parallel::detectCores() - 1 )
fq.file | (character, path). The path to the individual fastq file to check.
Default: |
---|---|
min.coverage.fig | (character, path). Minimum coverage used to draw the
color on the figure.
Default: |
parallel.core | (integer) Enable parallel execution with the number of threads.
Default: |
The function returns the read depth groups plot.
4 read coverage groups are shown:
distinct reads with low coverage (in red): these reads are likely sequencing errors or uninformative polymorphisms (shared only by a few samples).
disting reads for a target coverage (in green):
Usually represent around 80
It’s a safe coverage range to start exploring your data (open for discussion).
Lower threshold (default = 7): you can’t escape it, it’s your tolerance to call heterozygote a true heterozygote. You want a minimum coverage for both the reference and the alternative allele. Yes, you can use population information to lower this threshold or use some fancy bayesian algorithm.
Higher threshold: is a lot more open for discussion, here it’s the lower limit of another group (the orange, see below for description). Minus 1 bp.
distinct reads with high coverage > 1 read depth (in yellow): those are legitimate alleles with high coverage.
distinct and unique reads with high coverage (in orange): those repetitive elements when assembled in locus are usually paralogs, retrotransposons, transposable elements, etc.
Ilut, D., Nydam, M., Hare, M. (2014). Defining Loci in Restriction-Based Reduced Representation Genomic Data from Non model Species: Sources of Bias and Diagnostics for Optimal Clustering BioMed Research International 2014. https://dx.doi.org/10.1155/2014/675158