This function reads inside the output of
STACKS ustasks-cstacks-sstacks folder
and summarize the sstacks output files .matches
to get
the mean and median log likelihood of catalog loci.
summary_catalog_log_lik(
matches.folder,
parallel.core = parallel::detectCores() - 1,
verbose = FALSE
)
(logical). Folder where the .matches.tsv.gz
files are stored.
The files are generated by sstacks.
(optional) The number of core used for parallel
execution.
Default: parallel::detectCores() - 1
.
(optional) Make the function a little more chatty during
execution.
Default: verbose = FALSE
.
The function returns a summary (data frame) containing:
INDIVIDUALS: the sample id
LOCUS_NUMBER: the number of locus
BLACKLIST_USTACKS: the number of locus blacklisted by ustacks
FOR_CATALOG: the number of locus available to generate the catalog
BLACKLIST_ARTIFACT: the number of artifact genotypes (> 2 alleles, see details)
FILTERED: the number of locus after artifacts are removed
HOMOZYGOSITY: the number of homozygous genotypes
HETEROZYGOSITY: the number of heterozygous genotypes
MEAN_NUMBER_SNP_LOCUS: the mean number of SNP/locus (excluding the artifact locus)
MAX_NUMBER_SNP_LOCUS: the max number of SNP/locus observed for this individual (excluding the artifact locus)
NUMBER_LOCUS_4SNP: the number of locus with 4 or more SNP/locus (excluding the artifact locus)
Catchen JM, Amores A, Hohenlohe PA et al. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3, 1, 171-182.
Catchen JM, Hohenlohe PA, Bassham S, Amores A, Cresko WA (2013) Stacks: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.