This function reads the fastq file of an individual and generate a figure of read coverage groups.
read_depth_plot(
fq.file,
min.coverage.fig = 7L,
output = "08_stacks_results/02_read_depth_plot",
parallel.core = parallel::detectCores() - 1
)
(character, path). The path to the individual fastq file to check.
Default: fq.file = "my-sample.fq.gz"
.
(integer). Minimum coverage used to draw the
color on the figure.
Default: min.coverage.fig = 7L
.
(character, path) Where the figure will be saved.
Default: "08_stacks_results/02_read_depth_plot"
.
(integer) Enable parallel execution with the number of threads.
Default: parallel.core = parallel::detectCores() - 1
.
The function returns the read depth groups plot.
4 read coverage groups are shown:
distinct reads with low coverage (in red): these reads are likely sequencing errors or uninformative polymorphisms (shared only by a few samples).
disting reads for a target coverage (in green):
Usually represent around 80
It’s a safe coverage range to start exploring your data (open for discussion).
Lower threshold (default = 7): you can’t escape it, it’s your tolerance to call heterozygote a true heterozygote. You want a minimum coverage for both the reference and the alternative allele. Yes, you can use population information to lower this threshold or use some fancy bayesian algorithm.
Higher threshold: is a lot more open for discussion, here it’s the lower limit of another group (the orange, see below for description). Minus 1 bp.
distinct reads with high coverage > 1 read depth (in yellow): those are legitimate alleles with high coverage.
distinct and unique reads with high coverage (in orange): those repetitive elements when assembled in locus are usually paralogs, retrotransposons, transposable elements, etc.
Ilut, D., Nydam, M., Hare, M. (2014). Defining Loci in Restriction-Based Reduced Representation Genomic Data from Non model Species: Sources of Bias and Diagnostics for Optimal Clustering BioMed Research International 2014. https://dx.doi.org/10.1155/2014/675158