Run STACKS ustacks module inside R! Inside the folder 04_process_radtags, you should have all the individual's fastq files. Those files are created in the process_radtags module.

run_ustacks(
  mismatch.testing = FALSE,
  sample.list = NULL,
  project.info = NULL,
  f = "04_process_radtags",
  o = "06_ustacks_2_gstacks",
  m = 3,
  M = 2,
  N = M + 2,
  t = "guess",
  R = FALSE,
  H = TRUE,
  parallel.core = parallel::detectCores() - 1,
  h = FALSE,
  d = TRUE,
  keep.high.cov = FALSE,
  high.cov.thres = 3,
  max.locus.stacks = 3,
  k.len = NULL,
  max.gaps = 2,
  min.aln.len = 0.8,
  disable.gapped = FALSE,
  model.type = "snp",
  alpha = 0.05,
  bound.low = 0,
  bound.high = 0.2,
  bc.err.freq = NULL
)

Arguments

mismatch.testing

(logical). Default: mismatch.testing = FALSE.

sample.list

The default use all the samples in f, usually, the stacks process_radtags output folder, here defaulted to f = "04_process_radtags", see f argument below. When using mismatch.testing = TRUE, only one sample is allowed inside the folder specified in f (e.g. choose one with a mean number of read, or MB, file size). Default: sample.list = NULL. Power outage? no problem, see details below.

project.info

When using the stackr pipeline, a project info file is created. This file will be modified inside this function. The file is in the working directory (given the path or in the global environment). If no project.info file is provided, the function first look in the working directory for file(s) with "project.info" in it's name. If several files are found, the latest one created is used. Default: project.info = NULL. Power outage? no problem, see details below.

f

Input file path. Usually, the stacks process_radtags output folder. Default: f = "04_process_radtags".

o

Output path to write results. Default: o = "06_ustacks_2_gstacks".

m

Minimum depth of coverage required to create a stack. Default: m = 3.

M

Maximum distance (in nucleotides) allowed between stacks. Default: M = 2.

N

Maximum distance allowed to align secondary reads to primary stacks. Default: N = M + 2.

t

Input file type. Supported types: fasta, fastq, gzfasta, gzfastq, fq.gz, fastq.gz. Default: t = "guess".

R

Retain unused reads. Default: R = FALSE.

H

Disable calling haplotypes from secondary reads. Default: H = TRUE.

parallel.core

Enable parallel execution with num_threads threads. Default: parallel.core = parallel::detectCores() - 1.

h

Display this help messsage. Default: h = FALSE.

d

Enable the Deleveraging algorithm, used for resolving over merged tags. Default: d = TRUE.

keep.high.cov

Disable the algorithm that removes highly-repetitive stacks and nearby errors. Default: keep.high.cov = FALSE.

high.cov.thres

(double) Highly-repetitive stacks threshold, in standard deviation units. Default: high.cov.thres = 3.0.

max.locus.stacks

Maximum number of stacks at a single de novo locus. Default: max.locus.stacks = 3.

k.len

Specify k-mer size for matching between alleles and loci. Default: k.len = NULL.

max.gaps

Number of gaps allowed between stacks before merging. Default: max.gaps = 2.

min.aln.len

Minimum length of aligned sequence in a gapped alignment. Default: min.aln.len = 0.8.

disable.gapped

(logical) do not preform gapped alignments between stacks (default: gapped alignements enabled). Default: disable.gapped = FALSE.

model.type

Either 'snp' (default), 'bounded', or 'fixed'. Default: model.type = "snp".

alpha

For the SNP or Bounded SNP model, Chi square significance level required to call a heterozygote or homozygote, either 0.1, 0.05. Default: alpha = 0.05.

bound.low

For the bounded SNP model, lower bound for epsilon, the error rate, between 0 and 1.0. Default: bound.low = 0.

bound.high

For the bounded SNP model, upper bound for epsilon, the error rate, between 0 and 1.0. Default: bound.high = 0.2.

bc.err.freq

For the fixed model, specify the barcode error frequency, between 0 and 1.0. Default: bc.err.freq = NULL.

Value

ustacks returns 4 files per samples: .snps.tsv.gz, .tags.tsv.gz, .alleles.tsv.gz, .models.tsv.gz. In the global environment, the function returns a project info file (updated if one was provided) and a summary of ustacks for each samples.

Details

-i the unique integer ID to identify the sample (SQL ID), is taken from the project info file. If no project info file is provided, the id is created sequentially from the sample files. This id will be written in the project info file (if no file is found or given, a new file is created in the working directory).

Power outage? No problem:

Restart the function as it was. After the re-start the project info file created automatically during the previous run will be used. This ensure: i) that the unique SQL ids are not duplicated and ii) that ustacks can start at the sample is was assembling before the outage.

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

See also

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples