Run STACKS process_radtags module inside R. This function was tested on single-end sequences only. If you want to contribute with other types of RAD data, let me know.
run_process_radtags( project.info, path.seq.lanes = "03_sequencing_lanes", P = FALSE, I = FALSE, i = "guess", o = "04_process_radtags", y = "guess", pe.1 = NULL, pe.2 = NULL, c = TRUE, q = TRUE, r = TRUE, t = 90, D = FALSE, E = "phred33", w = 0.15, s = 10, barcode.inline.null = TRUE, barcode.index.null = FALSE, barcode.null.index = FALSE, barcode.inline.inline = FALSE, barcode.index.index = FALSE, barcode.inline.index = FALSE, barcode.index.inline = FALSE, enzyme = NULL, renz.1 = NULL, renz.2 = NULL, bestrad = FALSE, adapter.1 = NULL, adapter.2 = NULL, adapter.mm = NULL, retain.header = TRUE, merge = FALSE, filter.illumina = TRUE, disable.rad.check = FALSE, len.limit = NULL, barcode.dist.1 = 1, barcode.dist.2 = NULL, parallel.core = parallel::detectCores() - 1 )
project.info | (character, path) Path to the project info file.
The file is tab separated and as 3 columns named: |
---|---|
path.seq.lanes | (character, path) Path to sequencing lanes if
processing single-end sequences.
Same as |
P | (logical) files contained within the directory are paired.
Default: |
I | (logical) specify that the paired-end reads are interleaved in
single files.
Default: |
i | (character) Input file type, either |
o | (character, path) Path to output the processed files.
Default: |
y | (character) Output file type: |
pe.1 | (character, path) First input file in a set of
paired-end sequences.
In stacks it's the argument |
pe.2 | (character, path) Second input file in a set of
paired-end sequences.
In stacks it's the argument |
c | (logical) Clean data, remove any read with an uncalled base.
Default: |
q | (logical) Discard reads with low quality scores.
Default: |
r | (logical) Rescue barcodes and RAD-Tags.
Default: |
t | (integer) Truncate final read length to this value.
Default: |
D | (logical) Capture discarded reads to a file.
Default: |
E | (character) Specify how quality scores are encoded,
|
w | (double) Set the size of the sliding window as a fraction of the
read length, between 0 and 1.
Default: |
s | (integer) Set the score limit.
If the average score within the sliding window drops below this value,
the read is discarded (default 10).
Default: |
barcode.inline.null | (logical) Barcode is inline with sequence,
occurs only on single-end read.
Default: |
barcode.index.null | Barcode is provided in FASTQ header
(Illumina i5 or i7 read).
Default: |
barcode.null.index | Barcode is provded in FASTQ header
(Illumina i7 read if both i5 and i7 read are provided).
Default: |
barcode.inline.inline | Barcode is inline with sequence,
occurs on single and paired-end read.
Default: |
barcode.index.index | Barcode is provded in FASTQ header
(Illumina i5 and i7 reads).
Default: |
barcode.inline.index | Barcode is inline with sequence on single-end
read and occurs in FASTQ header (from either i5 or i7 read).
Default: |
barcode.index.inline | Barcode occurs in FASTQ header
(Illumina i5 or i7 read) and is inline with single-end sequence
(for single-end data) on paired-end read (for paired-end data).
Default: |
enzyme | (character) Provide the restriction enzyme used
(cut site occurs on single-end read).
If double-digest use: |
renz.1 | (character) When a double digest was used, provide the first restriction enzyme used. |
renz.2 | (character) When a double digest was used, provide the second restriction enzyme used (cut site occurs on the paired-end read). |
bestrad | (logical) library was generated using BestRAD,
check for restriction enzyme on either read and potentially tranpose reads.
Default: |
adapter.1 | (character) Provide adaptor sequence that may occur on the single-end read for filtering. |
adapter.2 | (character) Provide adaptor sequence that may occur on the paired-read for filtering. |
adapter.mm | (integer) Number of mismatches allowed in the adapter sequence. |
retain.header | (logical) Retain unmodified FASTQ headers in the output.
Default: |
merge | (logical) If no barcodes are specified,
merge all input files into a single output file.
Default: |
filter.illumina | (logical) Discard reads that have been marked by
Illumina's chastity/purity filter as failing. With Ion Torrent sequencing, you
have to use |
disable.rad.check | (logical) Disable checking if the RAD site is intact.
Default: |
len.limit | (integer) Specify a minimum sequence length
(useful if your data has already been trimmed).
Default: |
barcode.dist.1 | (integer) The number of allowed mismatches when
rescuing single-end barcodes.
Default: |
barcode.dist.2 | (integer) The number of allowed mismatches when
rescuing paired-end barcodes.
Default: |
parallel.core | (optional) The number of core for parallel
programming.
Default: |
For stacks specific output see process_radtags.
Step 1. Individual naming
Please take a moment to think about the way you want to name your individuals... Here is my recipe:
Include metadata: SPECIES, POPULATIONS or SAMPLING SITES, MATURITY, YEAR of sampling, numerical ID with 3 to 4 digits.
3 letters in ALL CAPS: capital letters are easier to read and will reduce confusion for other people in deciphering your handwritting or the font you've used. The 3 letters will keep it short.
only 1 type of separator: the dash (-): why? it's the easiest separator to avoid confusion. Please avoid the underscore (_) as it will sometimes be impossible to tell if it's a whitespace or an underscore in printd lists or even in some codes.
example:
Following this convention: SPECIES-POP-MATURITY-YEAR-ID
My ids: STU-QUE-ADU-2017-0001, STU-ONT-JUV-2016-0002
Species: Sturgeon
Sampling site: Québec and Ontario
Year of sampling: 2016 and 2017
MATURITY: adult and juvenile
ID: 2 samples 0001 and 0002
Step 2. project info file:
Create a tab separated file (e.g. in MS Excel) with 3 columns named:
LANES
, BRACODES
and INDIVIDUALS
.
For each individuals you've just created,
give the barcodes and lanes name.
REPLICATES ?
You have replicates? Awesome. stackr makes it easy to keep track of replicates. Use the same name for individual replicates. They will have different barcodes, and can potentially be on different lanes. No problem. stackr will combine fastq file at the end, keeping original replicates intact. However, stackr will be appending integers (e.g. STU-QUE-ADU-2017-0001-1, STU-QUE-ADU-2017-0001-2) at the end of the names you've chosen). Combined replicates will have -R at the end (e.g STU-QUE-ADU-2017-0001-R for the combination of the 2 replicates.)
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
if (FALSE) { # library(stackr) # If you haven't already build the folders to store all the files: # stackr::build_stackr_workflow_dir() # # run a double digest process_radtags within R: process.radtags.tuna <- stackr::run_process_radtags( project.info = "02_project_info/project.info.tuna.tsv", path.seq.lanes = "03_sequencing_lanes", renz.1 = "pstI", renz.2 = "mspI", adapter.1 = "CGAGATCGGAAGAGCGGG", adapter.mm = 2) # remaining arguments are defaults, so carefully look at them in the doc. }