NEWS.md
summary_
functions to help decide thresholds, works with most stacks version.run_radproc
: new function that will run RADProc and generate ustacks and cstacks file type.summary_reads
: new function that highlight GC content and INDELs. It can also produce the read depth plot, in parallel for all sample found in the directory.summary_cstacks
and summary_sstacks
functions to help decide thresholdssummary_cstacks
and summary_sstacks
functions to help decide thresholdsnormalize_samples
: new function to check the impact of biased read numbers per individual. The function normalize the number of reads and generate replicate samples.stackr
to follow Stacks Version 2.0Beta1run_populations_v2
will replace run_populations
in 2 updatesrun_tsv2bam
: new function that runs Stacks tsv2bam module. Additionnally, this function will also generate a summary of Stacks tsv2bam and will merge in parallel BAM sample files into a unique BAM catalog file using SAMtools or Sambamba.run_gstacks
runs Stacks gstacks module.stackr
on running stacks pipeline within R.radiator
.pbmcapply
package.pbmcapply
package.tidy_genomic_data
: bug fix that originated with the new version of PEGAS.summary_haplotypes
: updated codes and output tables.pi
: a new function to compute Nei’s Pi nucleotide diversity from a wide range of input files. The haplotype version is found in summary_haplotypes
.merge_vcf
and split_vcf
.run_ustacks
: allows to run Stacks
ustacks module inside R with the option to run mismatch thresholds testing…tidy_genomic_data
: bug fix introduce with previous commit when fixing LOCUS
and COL
with stacks version > 1.44. Thanks to Eric Archer for highlighting the bug.
summary_haplotypes
: this function gets a new arguments, keep.consensus
, to enable the calculation of pi to include or not the consensus markers presents in stacks haplotype file (e.g. batch_1.haplotypes.tsv). This argument works to circumvent the impact of using a whitelist of markers, that potentially removed those markers in previous versions. Also changed in this function, the summary table include a POLYMORPHISM column that no longer include the artifact marker counts (markers with more than 2 alleles). This information is kept in a separate column (as before).
detect_duplicate_genomes
: huge speed bump for pairwise genome similarity method. Instead of hours the range is more in minutes.stackr_imputations_module
: better integration of VCF with haplotypes so that nucleotide information is kept during imputations.filter_fis
: bug fix when no heterozygote were found. Thanks to Manuel Lamothe.stackr_imputations_module
: work on faster on-the-fly random forest and extreme gradient tree boosting algorithm.Major work on tidy_genomic_data
:
platypus
vcf files are correctly importedsummary_haplotypes
function. With progress bar…summary_haplotypes
not properly summarizing info when no assembly artifacts were foundtidy_genomic_data
and genomic_converter
functions.strataG
object to work with tidy data and pass Travis CI.stackr
parallel mode now works with Windows! Nothing to install, just need to choose the number of CPU, the rest is done automatically.haplo2colony
is deprecated. Use the new function called write_colony
!write_colony
: works similarly to the deprecated function haplo2colony
, * with the major advantage that it’s no longer restricted to STACKS haplotypes file. * The function is using the tidy_genomic_data
module to import files. So you can choose one of the 10 input file formats supported by stackr
! * other benefits also include the possibility to efficiently test MAF, snp.ld, haplotypes/snp approach, whitelist of markes, blacklist of individuals, blacklist of genotypes, etc. with the buit-it arguments. * the function only keeps markers in common between populations/groups and is removing monomorphic markers. * Note: there are several defaults in the function and it’s a complicated file format, so make sure to read the function documentation, please, and COLONY
manual.tidy_genomic_data
to read unconventional Tassel VCFibdg_fh
computes the FH measure that was previously computed in summary_haplotypes
. It now works with biallelic and multiallelic data. The FH measure is based on the excess in the observed number of homozygous genotypes within an individual relative to the mean number of homozygous genotypes expected under random mating (see function for details). The IBDg
in the name is because the measure is a proxy of the realized proportion of the genome that is identical by descent by reference to the current population under hypothetical random mating.missing_visualization
now computes the FH measure and look for correlation with average missingness per individual.tidy_stacks_haplotypes_vcf
is now deprecated in favor of using tidy_genomic_data
that will import haplotypic vcf files.stackr_imputations_module
no longer imputes globally after imputations by populations. Instead, use common.markers
or not to test impacts.ref_alt_alleles
that was not working properly inside the imputation module.snp_ld
is not a separate module available for users. Check documentation.missing_visualization
now show the proportion of variance with plot axis text.summary_haplotypes
stemming from a new readr
versionartifacts
replace paralogs
in summary_haplotypes
gtypes
object from [strataG] (https://github.com/EricArcher/strataG) package can now be read/write in/out of Stackr using the tidy_genomic_data
and genomic_converter
functions.filter_genotype_likelihood
, since the updated function to the interactive mode, some old code where still present in if/else sentences, breaking the code. Thanks to Jaromir Guzinski for the bug report.write_vcf
, the function was using REF/ALT coding in integer not character format. This function is used inside vcf_imputation
and sometimes inside genomic_converter
. Thanks to @jeansebastienmoore for highlighting the problem.vcf_imputation
, the function now calls genomic_converter
with all the bells and whistles of that function (updated vcf import and imputations modules)tidy_fstat
summary_haplotypes
introduced by the new version of dplyr::distinct
(0.5.0)summary_haplotypes
tidy_genomic_data
: added a check to throw an error when pop.levels != the pop.id in stratagenomic_converter
including all the vcf2...
function can now use phase/unphase genotypes. Some pyRAD vcf (e.g. 3.0.64) have a mix of GT format with /
and |
. e.g. missing GT = ./.
and genotyped individuals = 0|0
. I’m not sure it follows VCF specification, but stackr can now read those vcf files.vcf2dadi
is more user-friendly for scientist with in- and out-group metadata, using STACKS or not.if (getRversion() >= "2.15.1") utils::globalVariables("variable")
and @inheritParams
was not showing all the argument description.vcf
, plink
, genind
, genlight
, gtypes
, hierfstat
, genepop
, structure
and betadiv
are now separate modules available to users (look for write_...
with the outputformat)genomic_converter
: If you want the to convert from the supported input file formats to many output formats, at once, this is the function. With the new function genomic_converter
, import and imputations are only done once, saving time if you were generating different output WITH imputations.vcf2...
functions (excep vcf2dadi
) are now a shorcut of genomic_converter
. This is particularly interesting and faster if you were generating different output WITH imputations. This makes the functions vcf2...
and genomic_converter
easier to debug for me and more stable for users.haplo2...
functions are all deprecated and replaced by genomic_converter
, except haplo2colony that requires so many arguments that it would be too complicated, for now, to integrate with genomic_converter
.pop.select
, blacklist.id
and imputation.method
are used, the REF and ALT alleles are now re-computed to account for the filters and imputations.tidy_genomic_data
tidy_genomic_data
while using data.table::melt.data.table instead of tidyr::gather, and forgot to(i) add variable.factor = FALSE when melting the vcf and (ii) use as_data_frame at the end of the melting to be able to continue working with dplyr verbs.
NEWS.md
file to track changes to the package.individuals2strata
. Several functions in stackr and [assigner] (https://github.com/thierrygosselin/assigner) requires a strata
argument, i.e. a data frame with the individuals and associated groupings. You can do it manually, however, if your individuals have a consistent naming scheme (e.g. SPECIES-POPULATION-MATURITY-YEAR-ID = CHI-QUE-ADU-2014-020), use this function to rapidly create a strata file.tidy_genomic_data
. Transform common genomic dataset format in a tidy data frame. Used internally in stackr and [assigner] (https://github.com/thierrygosselin/assigner) and might be of interest for users.read_long_tidy_wide
. Read genomic data frames in long/tidy and wide format. Used internally in stackr and [assigner] (https://github.com/thierrygosselin/assigner) and might be of interest for users.stackr_imputations_module
. Map-independent imputation of missing genotype using Random Forest or the most frequent category. Impute genotypes or alleles. Used internally in stackr and [assigner] (https://github.com/thierrygosselin/assigner) and might be of interest for users.find_duplicate_id
Compute pairwise genome similarity to highligh potential duplicate individuals.tped/tfam
format. Map-independent imputation also available.vcf2plink: to easily convert a VCF file created in STACKS to a PLINK input file (tped/tfam format). This function comes with the commonly used arguments in stackr: map-independent imputation, whitelist, blacklist, common marker filtering, etc.
data_pruning: to prune your dataset with whitelist, blacklist of individuals, erase genotypes, use common markers and other filtering (see function argument while waiting for the upcomming documentation).
Introducing several new functions:
Introducing haplo2gsi_sim function.
Introducing haplo2fstat function. Conversion of STACKS haplotypes file into a hierfstat object and fstat file. Access all the functions in the R package [hierfstat] (https://github.com/jgx65/hierfstat).
Map-independent imputations of a VCF file created by STACKS. Two options are available for imputations: using Random Forest or the most frequent allele.
Before imputations, the VCF file can be filtered with:
The summary_haplotypes function now outputs:
Keller MC, Visscher PM, Goddard ME. 2011. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics, 189, 237–249.
Kardos M, Luikart G, Allendorf FW. 2015. Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees. Heredity, 115, 63–72.
Nei M, Li WH. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proceedings of the National Academy of Sciences of the United States of America, 76, 5269–5273.
The haplo2colony function