Run STACKS
cstacks
module inside R! The function runs a summary of the log file automatically
at the end (summary_cstacks
). In the event of a power outage,
computer or cluster crash, just re-run the function. The function will start
over from the last catalog generated.
run_cstacks(
P = "06_ustacks_2_gstacks",
o = "06_ustacks_2_gstacks",
M = "02_project_info/population.map.catalog.tsv",
catalog.path = NULL,
n = 1,
parallel.core = parallel::detectCores() - 1,
max.gaps = 2,
min.aln.len = 0.8,
disable.gapped = FALSE,
k.len = NULL,
report.mmatches = FALSE,
split.catalog = 20
)
path to the directory containing STACKS files.
Default: P = "06_ustacks_2_gstacks"
.
Inside the folder 06_ustacks_2_gstacks
, you should have:
4 files for each samples: The sample name is the prefix of
the files ending with:
.alleles.tsv.gz, .models.tsv.gz, .snps.tsv.gz, .tags.tsv.gz
.
Those files are created in the
ustacks
module.
Output path to write catalog.
Default: o = "06_ustacks_2_gstacks"
path to a population map file (Required when P is used).
Default: M = "02_project_info/population.map.catalog.tsv"
.
This is for the "Catalog editing" part in cstacks where
you can provide the path to an existing catalog.
cstacks will add data to this existing catalog.
With default: catalog.path = NULL
or with a supplied path, the function
The function scan automatically for the presence of a catalog inside the input folder.
If none is found, a new catalog is created.
If your catalog is not in the input folder, supply a path here.
e.g. catalog.path = ~/catalog_folder
, the catalog files are inside the
P folder along the samples files and detected automatically.
If a catalog is detected in the input folder,
the samples in the sample.list
argument
will be added in this catalog. The catalog is made of 3 files:
catalog.alleles.tsv.gz, catalog.snps.tsv.gz, catalog.tags.tsv.gz
number of mismatches allowed between sample loci when build the catalog.
Default: n = 1
Enable parallel execution with num_threads threads.
Default: parallel.core = parallel::detectCores() - 1
The number of gaps allowed between stacks before merging.
Default: max.gaps = 2
The minimum length of aligned sequence in a gapped
alignment.
Default: min.aln.len = 0.8
Disable gapped alignments between stacks.
Default: disable.gapped = FALSE
(use gapped alignments).
Specify k-mer size for matching between between catalog loci
(automatically calculated by default).
Advice: don't modify.
Default: k.len = NULL
Report query loci that match more than one catalog locus.
Advice: don't modify.
Default: report.mmatches = FALSE
(integer) In how many samples you want to split the
catalog population map. This allows to have a backup catalog every
split.catalog
samples. Their is obviously a trade-off between the
integer use here, the time to initialize an existing catalog and
re-starting from zero if everything crash.
Default: split.catalog = 20
. Very useful on a personal computer or
university computer cluster....
sstacks
returns a .matches.tsv.gz file for each sample
Computer or server problem during the cstacks ? Look in the log file to see which individuals remains to be included. Create a new list of individuals to include and use the catalog.path argument to point to the catalog created before the problem.
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.