Run STACKS process_radtags module

Run STACKS process_radtags module inside R. This function was tested on single-end sequences only. If you want to contribute with other types of RAD data, let me know.

run_process_radtags(
  project.info,
  path.seq.lanes = "03_sequencing_lanes",
  paired.files = FALSE,
  interleaved = FALSE,
  input.type = "guess",
  output.path = "04_process_radtags",
  output.type = "guess",
  pe.1 = NULL,
  pe.2 = NULL,
  clean.data = TRUE,
  discard.reads = TRUE,
  rescue.barcodes = TRUE,
  truncate = 90,
  capture.discard.reads = FALSE,
  barcode.inline.null = TRUE,
  barcode.index.null = FALSE,
  barcode.null.index = FALSE,
  barcode.inline.inline = FALSE,
  barcode.index.index = FALSE,
  barcode.inline.index = FALSE,
  barcode.index.inline = FALSE,
  enzyme = NULL,
  renz.1 = NULL,
  renz.2 = NULL,
  bestrad = FALSE,
  adapter.1 = NULL,
  adapter.2 = NULL,
  adapter.mm = NULL,
  retain.header = TRUE,
  merge = FALSE,
  filter.illumina = TRUE,
  disable.rad.check = FALSE,
  encoding = "phred33",
  window.size = 0.15,
  score.limit = 10,
  len.limit = NULL,
  barcode.dist.1 = 1,
  barcode.dist.2 = NULL,
  threads = min(parallel::detectCores(), 2),
  parallel.core = floor(parallel::detectCores()/2)
)

Arguments

project.info: (character, path) Path to the project info file. The file is tab separated and as 3 columns named: LANES, BRACODES and INDIVIDUALS. See details for more info.
path.seq.lanes: (character, path) Path to sequencing lanes if processing single-end sequences. Same as f in STACKS command line. Default: path.seq.lanes = "03_sequencing_lanes".
paired.files: (logical) files contained within the directory are paired. Default: paired.files = FALSE.
interleaved: (logical) specify that the paired-end reads are interleaved in single files. Default: interleaved = FALSE.
input.type: (character) Input file type, either "fastq", "gzfastq", "bam", or bustard. Default: input.type = "guess".
output.path: (character, path) Path to output the processed files. Default: output.path = "04_process_radtags".
output.type: (character) Output file type: "fastq", "gzfastq", "fasta", or gzfasta. Default: output.type = "guess" (match input type).
pe.1: (character, path) First input file in a set of paired-end sequences. In stacks it's the argument 1. Default: pe.1 = "FORWARD". Corresponding to the FORWARD column in the project info file.
pe.2: (character, path) Second input file in a set of paired-end sequences. In stacks it's the argument 2. Default: pe.2 = "REVERSE". Corresponding to the REVERSE column in the project info file.
clean.data: (logical) Clean data, remove any read with an uncalled base. Default: clean.data = TRUE.
discard.reads: (logical) Discard reads with low quality scores. Default: discard.reads = TRUE.
rescue.barcodes: (logical) Rescue barcodes and RAD-Tags. Default: rescue.barcodes = TRUE.
truncate: (integer) Truncate final read length to this value. Default: truncate = 90.
capture.discard.reads: (logical) Capture discarded reads to a file. Default: capture.discard.reads = FALSE.
barcode.inline.null: (logical) Barcode is inline with sequence, occurs only on single-end read. Default: barcode.inline.null = TRUE.
barcode.index.null: Barcode is provided in FASTQ header (Illumina i5 or i7 read). Default: barcode.index.null = FALSE.
barcode.null.index: Barcode is provded in FASTQ header (Illumina i7 read if both i5 and i7 read are provided). Default: barcode.null.index = FALSE.
barcode.inline.inline: Barcode is inline with sequence, occurs on single and paired-end read. Default: barcode.inline.inline = FALSE.
barcode.index.index: Barcode is provded in FASTQ header (Illumina i5 and i7 reads). Default: barcode.index.index = FALSE.
barcode.inline.index: Barcode is inline with sequence on single-end read and occurs in FASTQ header (from either i5 or i7 read). Default: barcode.inline.index = FALSE.
barcode.index.inline: Barcode occurs in FASTQ header (Illumina i5 or i7 read) and is inline with single-end sequence (for single-end data) on paired-end read (for paired-end data). Default: barcode.index.inline = FALSE.
enzyme: (character) Provide the restriction enzyme used (cut site occurs on single-end read). If double-digest use: enzyme = NULL. Currently supported enzymes include: "aciI", "ageI", "aluI", "apaLI", "apeKI", "apoI", "aseI", "bamHI", "bbvCI", "bfaI", "bfuCI", "bgIII", "bsaHI", "bspDI", "bstYI", "claI", "csp6I", "ddeI", "dpnII", "eaeI", "ecoRI", "ecoRV", "ecoT22I", "haeIII", "hinP1I", "hindIII", "hpaII", "kpnI", "mluCI", "mseI", "mspI", "ncoI", "ndeI", "nheI", "nlaIII", "nspI", "notI", "nsiI", "pstI", "rsaI", "sacI", "sau3AI", "sbfI", "sexAI", "sgrAI", "speI", "sphI", "taqI", "xbaI" or "xhoI".
renz.1: (character) When a double digest was used, provide the first restriction enzyme used.
renz.2: (character) When a double digest was used, provide the second restriction enzyme used (cut site occurs on the paired-end read).
bestrad: (logical) library was generated using BestRAD, check for restriction enzyme on either read and potentially tranpose reads. Default: bestrad = FALSE.
adapter.1: (character) Provide adaptor sequence that may occur on the single-end read for filtering.
adapter.2: (character) Provide adaptor sequence that may occur on the paired-read for filtering.
adapter.mm: (integer) Number of mismatches allowed in the adapter sequence.
retain.header: (logical) Retain unmodified FASTQ headers in the output. Default: retain.header = TRUE.
merge: (logical) If no barcodes are specified, merge all input files into a single output file. Default: merge = FALSE.
filter.illumina: (logical) Discard reads that have been marked by Illumina's chastity/purity filter as failing. With Ion Torrent sequencing, you have to use filter.illumina = FALSE. Default: filter.illumina = TRUE.
disable.rad.check: (logical) Disable checking if the RAD site is intact. Default: disable.rad.check = FALSE.
encoding: (character) Specify how quality scores are encoded, "phred33" for Illumina 1.8+/Sanger or "phred64" for Illumina 1.3-1.5. Default: encoding = "phred33".
window.size: (double) Set the size of the sliding window as a fraction of the read length, between 0 and 1. Default: window.size = 0.15.
score.limit: (integer) Set the score limit. If the average score within the sliding window drops below this value, the read is discarded. Default: score.limit = 10.
len.limit: (integer) Specify a minimum sequence length (useful if your data has already been trimmed). Default: len.limit = NULL.
barcode.dist.1: (integer) The number of allowed mismatches when rescuing single-end barcodes. Default: barcode.dist.1 = 1.
barcode.dist.2: (integer) The number of allowed mismatches when rescuing paired-end barcodes. Default: barcode.dist.2 = NULL (will default to barcode.dist.1).
threads: (optional) The number of threads to run (stacks, max 24). Default: threads = min(parallel::detectCores(), 2).
parallel.core: (optional) The number of core for parallel processing. Much faster to have higher number here than with threads. Default: floor(parallel::detectCores() / 2).

Value

For stacks specific output see process_radtags.

Details

Step 1. Individual naming

Please take a moment to think about the way you want to name your individuals... Here is my recipe:

Include metadata: SPECIES, POPULATIONS or SAMPLING SITES, MATURITY, YEAR of sampling, numerical ID with 3 to 4 digits.
3 letters in ALL CAPS: capital letters are easier to read and will reduce confusion for other people in deciphering your handwritting or the font you've used. The 3 letters will keep it short.
only 1 type of separator: the dash (-): why? it's the easiest separator to avoid confusion. Please avoid the underscore (_) as it will sometimes be impossible to tell if it's a whitespace or an underscore in printd lists or even in some codes.

example:

Following this convention: SPECIES-POP-MATURITY-YEAR-ID

My ids: STU-QUE-ADU-2017-0001, STU-ONT-JUV-2016-0002

Species: Sturgeon

Sampling site: Québec and Ontario

Year of sampling: 2016 and 2017

MATURITY: adult and juvenile

ID: 2 samples 0001 and 0002

Step 2. project info file:

Create a tab separated file (e.g. in MS Excel) with 3 columns named: LANES, BRACODES and INDIVIDUALS. For each individuals you've just created, give the barcodes and lanes name.

REPLICATES ?

You have replicates? Awesome. stackr makes it easy to keep track of replicates. Use the same name for individual replicates. They will have different barcodes, and can potentially be on different lanes. No problem. stackr will combine fastq file at the end, keeping original replicates intact. However, stackr will be appending integers (e.g. STU-QUE-ADU-2017-0001-1, STU-QUE-ADU-2017-0001-2) at the end of the names you've chosen). Combined replicates will have -R at the end (e.g STU-QUE-ADU-2017-0001-R for the combination of the 2 replicates.)

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Examples

if (FALSE) { # \dontrun{
# library(stackr)
# If you haven't already build the folders to store all the files:
# stackr::build_stackr_workflow_dir()
#
# run a double digest process_radtags within R:
process.radtags.tuna <- stackr::run_process_radtags(
project.info = "02_project_info/project.info.tuna.tsv",
path.seq.lanes = "03_sequencing_lanes",
renz.1 = "pstI",
renz.2 = "mspI",
adapter.1 = "CGAGATCGGAAGAGCGGG",
adapter.mm = 2
)
# remaining arguments are defaults, so carefully look at them in the doc.
} # }

Arguments

Value

Details

References

See also

Examples