Runs STACKS tsv2bam module and additionnally, this function will also generate a summary of stacks tsv2bam and will merge in parallel BAM sample files into a unique BAM catalog file using SAMtools or Sambamba. tsv2bam converts the data (single-end or paired-end) from being organized by sample into being organized by locus. This allows downstream improvements (e.g. Bayesian SNP calling).

run_tsv2bam(
  P = "06_ustacks_2_gstacks",
  M = "02_project_info/population.map.tsv2bam.tsv",
  R = NULL,
  parallel.core = parallel::detectCores() - 1,
  cmd.path = "/usr/local/bin/samtools",
  h = FALSE
)

Arguments

P

(path, character) Path to the directory containing STACKS files. Default: P = "06_ustacks_2_gstacks". Inside the folder, you should have:

  • the catalog files: starting with batch_ and ending with .alleles.tsv.gz, .snps.tsv.gz, .tags.tsv.gz;

  • 3 files for each samples: The sample name is the prefix for the files ending with: .alleles.tsv.gz, .snps.tsv.gz, .tags.tsv.gz. Those files are created in the ustacks, sstacks and cxstacks modules.

M

(character, path) Path to a population map file. Note that the -s option is not used inside stackr. Default: M = "02_project_info/population.map.tsv2bam.tsv".

R

(path, character) Directory where to find the paired-end reads files (in fastq/fasta/bam (gz) format).

parallel.core

(integer) Enable parallel execution with the number of threads. Default: parallel.core = parallel::detectCores() - 1

cmd.path

(character, path) Provide the FULL path to SAMtools program. See details on how to install SAMtools. Default: cmd.path = "/usr/local/bin/samtools".

h

Display this help messsage. Default: h = FALSE

Value

tsv2bam returns a set of .matches.bam files.

The function run_tsv2bam returns a list with the number of individuals, the batch ID number, a summary data frame and a plot containing:

  1. INDIVIDUALS: the sample id

  2. ALL_LOCUS: the total number of locus for the individual (shown in subplot A)

  3. LOCUS: the number of locus with a one-to-one relationship (shown in subplot B) with the catalog

  4. MATCH_PERCENT: the percentage of locus with a one-to-one relationship with the catalog (shown in subplot C)

    Addtionally, the function returns a catalog.bam file, generated by merging all the individual BAM files in parallel.

Details

Install SAMtools link to detailed instructions on how to install SAMtools

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9.

Li H A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011 Nov 1;27(21):2987-93.

A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.

See also

Examples

if (FALSE) { # The simplest form of the function: bam.sum <- stackr::run_tsv2bam() # that's it ! }