This function reads inside the output of STACKS ustasks-cstacks-sstacks folder and and summarize the sstacks output files .matches to get the mean and median log likelihood of catalog loci.

summary_catalog_log_lik(
  matches.folder,
  parallel.core = parallel::detectCores() - 1,
  verbose = FALSE
)

Arguments

matches.folder

(logical). Folder where the .matches.tsv.gz files are stored. Thes files are generated by sstacks.

parallel.core

(optional) The number of core used for parallel execution. Default: parallel::detectCores() - 1.

verbose

(optional) Make the function a little more chatty during execution. Default: verbose = FALSE.

Value

The function returns a summary (data frame) containing:

  1. INDIVIDUALS: the sample id

  2. LOCUS_NUMBER: the number of locus

  3. BLACKLIST_USTACKS: the number of locus blacklisted by ustacks

  4. FOR_CATALOG: the number of locus available to generate the catalog

  5. BLACKLIST_ARTIFACT: the number of artifact genotypes (> 2 alleles, see details)

  6. FILTERED: the number of locus after artifacts are removed

  7. HOMOZYGOSITY: the number of homozygous genotypes

  8. HETEROZYGOSITY: the number of heterozygous genotypes

  9. MEAN_NUMBER_SNP_LOCUS: the mean number of SNP/locus (excluding the artifact locus)

  10. MAX_NUMBER_SNP_LOCUS: the max number of SNP/locus observed for this individual (excluding the artifact locus)

  11. NUMBER_LOCUS_4SNP: the number of locus with 4 or more SNP/locus (excluding the artifact locus)

References

Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.

Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

See also

Examples