This function reads the fastq file of an individual and clean it by removing:
unique reads with high coverage (likely paralogs or TE)
distinct reads with low coverage
clean_fq(
fq.files,
paired.end = FALSE,
min.coverage.threshold = 2L,
max.coverage.threshold = "high.coverage.unique.reads",
remove.unique.reads = TRUE,
write.blacklist = TRUE,
write.blacklist.fasta = TRUE,
compress = FALSE,
output = "08_stacks_results/03_cleaned_fq",
parallel.core = parallel::detectCores() - 1
)
(character, path). The path to the individual fastq file to check.
Default: fq.files = "my-sample.fq.gz"
.
(logical) Are the files paired-end.
Default: paired.end = FALSE
.
(integer). Minimum coverage threshold.
The function will remove distinct reads with coverage <= to the threshold.
To turn off, min.coverage.threshold = NULL or 0L
.
Default: min.coverage.threshold = 2L
.
(integer, character). Maximum coverage threshold.
The function will remove distinct reads with coverage >= than this threshold.
To turn off, max.coverage.threshold = NULL
.
The default, use the starting depth where high coverage unique reads are observed.
Default: max.coverage.threshold = "high.coverage.unique.reads"
.
(logical). Remove distinct unique reads with high
coverage. Likely paralogs or Transposable elements.
Default: remove.unique.reads = TRUE
.
(logical). Write the blacklisted reads to a file.
Default: write.blacklist = TRUE
.
(logical). Write the blacklisted reads to a
fasta file.
Default: write.blacklist.fasta = TRUE
.
(logical) To compress the output files. If you have the disk
space, don't compress, it's way faster this way to write.
Default: compress = FALSE
.
(character, path) Write the cleaned fq files in a specific directory.
Default: output = "08_stacks_results/03_cleaned_fq"
.
(integer) Enable parallel execution with the number of threads.
Default: parallel.core = parallel::detectCores() - 1
.
The function returns a cleaned fq file with the name of the sample and
-C
appended to the filename.
coming soon, just try it in the meantime...