Run STACKS cstacks module

Run STACKS cstacks module inside R! The function runs a summary of the log file automatically at the end (summary_cstacks). In the event of a power outage, computer or cluster crash, just re-run the function. The function will start over from the last catalog generated.

run_cstacks(
  P = "06_ustacks_2_gstacks",
  o = "06_ustacks_2_gstacks",
  M = "02_project_info/population.map.catalog.tsv",
  catalog.path = NULL,
  n = 1,
  parallel.core = parallel::detectCores() - 1,
  max.gaps = 2,
  min.aln.len = 0.8,
  disable.gapped = FALSE,
  k.len = NULL,
  report.mmatches = FALSE,
  split.catalog = 20
)

Arguments

P

path to the directory containing STACKS files. Default: P = "06_ustacks_2_gstacks". Inside the folder 06_ustacks_2_gstacks, you should have:

4 files for each samples: The sample name is the prefix of the files ending with: .alleles.tsv.gz, .models.tsv.gz, .snps.tsv.gz, .tags.tsv.gz. Those files are created in the ustacks module.

o

Output path to write catalog. Default: o = "06_ustacks_2_gstacks"

M

path to a population map file (Required when P is used). Default: M = "02_project_info/population.map.catalog.tsv".

catalog.path

This is for the "Catalog editing" part in cstacks where you can provide the path to an existing catalog. cstacks will add data to this existing catalog. With default: catalog.path = NULL or with a supplied path, the function The function scan automatically for the presence of a catalog inside the input folder. If none is found, a new catalog is created. If your catalog is not in the input folder, supply a path here. e.g. catalog.path = ~/catalog_folder, the catalog files are inside the P folder along the samples files and detected automatically. If a catalog is detected in the input folder, the samples in the sample.list argument will be added in this catalog. The catalog is made of 3 files: catalog.alleles.tsv.gz, catalog.snps.tsv.gz, catalog.tags.tsv.gz

n

number of mismatches allowed between sample loci when build the catalog. Default: n = 1

parallel.core

Enable parallel execution with num_threads threads. Default: parallel.core = parallel::detectCores() - 1

max.gaps

The number of gaps allowed between stacks before merging. Default: max.gaps = 2

min.aln.len

The minimum length of aligned sequence in a gapped alignment. Default: min.aln.len = 0.8

disable.gapped

Disable gapped alignments between stacks. Default: disable.gapped = FALSE (use gapped alignments).

k.len

Specify k-mer size for matching between between catalog loci (automatically calculated by default). Advice: don't modify. Default: k.len = NULL

report.mmatches

Report query loci that match more than one catalog locus. Advice: don't modify. Default: report.mmatches = FALSE

split.catalog

(integer) In how many samples you want to split the catalog population map. This allows to have a backup catalog every split.catalog samples. Their is obviously a trade-off between the integer use here, the time to initialize an existing catalog and re-starting from zero if everything crash. Default: split.catalog = 20. Very useful on a personal computer or university computer cluster....

Value

sstacks returns a .matches.tsv.gz file for each sample

Details

Computer or server problem during the cstacks ? Look in the log file to see which individuals remains to be included. Create a new list of individuals to include and use the catalog.path argument to point to the catalog created before the problem.

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Arguments

Value

Details

References

See also

Examples