Rarefaction of fasq files by sub-sampling the reads before de novo assembly or alignment. The normalization/standardization/sample size correction step allows to check if some statistics are increasing with read numbers (e.g. heterozygous markers). It's a very easy way to disentangle artifact from biological signal caused by varying read numbers across samples.
normalize_reads( project.info = NULL, fq.files, sample.reads = 1e+06, number.replicates = 3, random.seed = NULL, parallel.core = parallel::detectCores() - 1 )
project.info | (character, path, optional) When using the stackr pipeline,
a project info file is created. This file provides all the info and stats
generated by stacks and stackr.
The project info file will be updated with the new samples.
The project info filename will be appended |
---|---|
fq.files | (character, path) Path of folder containing the samples to normalize. |
sample.reads | (integer) The number of reads to pick randomly.
Default: |
number.replicates | (interger) The number of samples to generate.
With default, if 20 samples are in the folder, 100 new samples will be generated.
Default: |
random.seed | (integer, optional) For reproducibility, set an integer
that will be used inside function that requires randomness. With default,
a random number is generated and printed in the appropriate output.
Default: |
parallel.core | (optional) The number of core for parallel
programming. Each samples to normalize is sequentially treated and replicates
are generated in parallel.
By default, |
fastq files with "-1", "-2", "..." appended to the original name.
If a project info file was provided, the new replicate samples info is integrated
to the file. The modified project info file will have _normalize
appended
to the original filename.
if (FALSE) { library(stackr) # To run this function, bioconductor \code{ShortRead} package is necessary: source("http://bioconductor.org/biocLite.R") biocLite("ShortRead") # Using OpenMP threads nthreads <- .Call(ShortRead:::.set_omp_threads, 1L) on.exit(.Call(ShortRead:::.set_omp_threads, nthreads)) # using defaults: stackr::normalize_reads(fq.files = "~/corals") # customizing the function: stackr::normalize_reads( project.info = "project.info.corals.tsv", fq.files = "~/corals", sample.reads = 2000000, number.replicates = 5, random.seed = 3, parallel.core = 5) # You then need to run stackr: run_ustacks, run_sstacks, run_tsv2bam, run_gstacks, run_populations # or equivalent if a reference genome. }