Runs STACKS tsv2bam module and additionnally, this function will also generate a summary of stacks tsv2bam and will merge in parallel BAM sample files into a unique BAM catalog file using SAMtools or Sambamba. tsv2bam converts the data (single-end or paired-end) from being organized by sample into being organized by locus. This allows downstream improvements (e.g. Bayesian SNP calling).
run_tsv2bam( P = "06_ustacks_2_gstacks", M = "02_project_info/population.map.tsv2bam.tsv", R = NULL, parallel.core = parallel::detectCores() - 1, cmd.path = "/usr/local/bin/samtools", h = FALSE )
P | (path, character) Path to the directory containing STACKS files.
Default: |
---|---|
M | (character, path) Path to a population map file.
Note that the |
R | (path, character) Directory where to find the paired-end reads files (in fastq/fasta/bam (gz) format). |
parallel.core | (integer) Enable parallel execution with the number of threads.
Default: |
cmd.path | (character, path) Provide the FULL path to SAMtools
program. See details on how to install SAMtools.
Default: |
h | Display this help messsage.
Default: |
tsv2bam
returns a set of .matches.bam
files.
The function run_tsv2bam
returns a list with the number of individuals, the batch ID number,
a summary data frame and a plot containing:
INDIVIDUALS: the sample id
ALL_LOCUS: the total number of locus for the individual (shown in subplot A)
LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog
MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)
Addtionally, the function returns a catalog.bam file, generated by merging all the individual BAM files in parallel.
Install SAMtools link to detailed instructions on how to install SAMtools
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.
Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.
Li H A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93.
A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.
if (FALSE) { # The simplest form of the function: bam.sum <- stackr::run_tsv2bam() # that's it ! }