run_gstacks( denovo = TRUE, P = "06_ustacks_2_gstacks", M = "02_project_info/population.map.tsv2bam.tsv", I = NULL, B = NULL, O = NULL, parallel.core = parallel::detectCores() - 1, model = "marukilow", var.alpha = 0.01, gt.alpha = 0.05, rm.pcr.duplicates = FALSE, rm.unpaired.reads = FALSE, ignore.pe.reads = TRUE, unpaired = TRUE, kmer.length = 31, max.debruijn.reads = 1000, min.kmer.cov = 2, write.alignments = FALSE, min.mapq = 10, max.clipped = 0.2, max.insert.len = 1000, details = FALSE, phasing.cooccurrences.thr.range = c(1, 2), phasing.dont.prune.hets = FALSE, h = FALSE )
denovo | (logical) de novo vs reference-based mode.
Default: |
---|---|
P | (path, character) De novo mode.
Path to the directory containing STACKS files.
Default:
. |
M | (path, character) Path to a population map giving the list of samples. |
I | (character, path) Reference-based mode.
Input directory containing BAM files.
Default: |
B | (character, path) Reference-based mode. Path to input BAM files.
The input BAM file(s) must be sorted by coordinate.
With |
O | (character, path) Path to output directory. |
parallel.core | (integer) Enable parallel execution with the number of threads.
Default: |
model | (character) The model to use to call variants and genotypes;
one of |
var.alpha | (double) Alpha threshold for discovering SNPs.
Default: |
gt.alpha | (double) Alpha threshold for calling genotypes.
Default: |
rm.pcr.duplicates | (logical) remove all but one set of read pairs of
the same sample that have the same insert length
(implies |
rm.unpaired.reads | (logical) Discard unpaired reads.
Default: |
ignore.pe.reads | (logical) With default the function will
ignore paired-end reads even if present in the input.
Default: |
unpaired | (logical) Ignore read pairing (only for paired-end GBS;
treat READ2's as if they were READ1's).
Default: |
kmer.length | (integer) De novo mode.
kmer length for the de Bruijn graph. For expert.
Default: |
max.debruijn.reads | (integer) Maximum number of reads to use in the
de Bruijn graph. For expert.
Default: |
min.kmer.cov | (integer) De novo mode.
Minimum coverage to consider a kmer. For expert.
Default: |
write.alignments | (logical) Save read alignments (heavy BAM files).
Default: |
min.mapq | (double) Reference-based mode.
Minimum PHRED-scaled mapping quality to consider a read.
Default: |
max.clipped | (double) Reference-based mode.
Maximum soft-clipping level, in fraction of read length.
Default: |
max.insert.len | (integer) Reference-based mode.
Maximum allowed sequencing insert length
Default: |
details | (logical) With default the function will write a more detailed
output.
Default: |
phasing.cooccurrences.thr.range | (integer) range of edge coverage thresholds to
iterate over when building the graph of allele cooccurrences for
SNP phasing.
Default: |
phasing.dont.prune.hets | (logical) Don't try to ignore dubious heterozygote
genotypes during phasing. By default, during phasing, dubious het are ignored.
Default: |
h | Display this help messsage.
Default: |
tsv2bam
returns a set of .matches.bam
files.
The function run_gstacks
returns a list with the number of individuals, the batch ID number,
a summary data frame and a plot containing:
INDIVIDUALS: the sample id
ALL_LOCUS: the total number of locus for the individual (shown in subplot A)
LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog
MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)
Addtionally, the function returns a batch_X.catalog.bam file that was generated by merging all the individual BAM files in parallel.
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
Maruki T, Lynch M (2017) Genotype Calling from Population-Genomic Sequencing Data. G3, 7, 1393-1404.
if (FALSE) { # The simplest form of the function with De novo data: bam.sum <- stackr::run_gstacks() # that's it ! # This will use, by default, the same population map used in run_tsv2bam }