NEWS.md
tidyr
, readr
, using future
, carrier
.assignment_ngs
pkgdown
assignment_ngs
fst_WC84
: work fasterassigner
with SeqArray
and GDS object/filefst_WC84
: work with radiator v.1.0assigner
and now lives exclusively in package grur
assignment_mixture
generated by purrr::df
replaced recently by purrr:dfr
. Changed DESCRIPTION
field accordingly.subsample
argument in assignment_ngs
and assignment_mixture
can now automatically detect the smallest sample size in the data’s grouping. So you can use subsample = "min"
to let the function decide (if your not sure).pbmcapply
package.pbmcapply
package.dlr
: simplified arguments, faster function and now creates the Dlr plotsSNPRelate
are removed until the bugs with Fst calculation are resolved.assignment_ngs
introduced in last commit that was suppose to be fix. Problem introduced by stackr::change_pop_names
.assigner
as a logofst_NEI87
subsample
and iteration.subsample
in fst_NEI87
and fst_WC84
SNPRelate
bias issue is resolved the option is unavailablepbmcapply
for Windowsassignment_ngs
and assignment_mixture
code cleaning to prep for CRAN and make them easier to debug.assigner
now works in parallel with Windows
write_gsi_sim
where the file was not created properly from an internal module.assigner::fst_WC84
can now use SNPRelate to compute Fst. The confidence intervals are not implemented, yet. The speed increase left me speechless, dataset with 30K snp are computed in less than 15 sec!assigner::fst_WC84
is 40% faster!assignment_ngs
during imputations, the imputation module could not recognise that REF/ALT alleles are not necessary or usefull for assignment analysis. *enhancement to assignment_ngs
and assignment_mixture
so that when marker.number
include "all"
the iteration.method
is set automatically to 1
when conducting the assignment with all the markers. Iterations at this point is useless and a waist of time.assignment_mixture
: with assignment.analysis = "gsi_sim
the unknown/mixture samples are compared with baseline populations using common markers between the pair. Now, the tables include the number of markers used. The summary provides the mean number of markers. This number will change each time randomness is used.fst_NEI87
: very fast function that can compute: the overall and pairwise Nei’s (1987) fst and f’st (prime). Bootstrap resampling of markers is avalaible to build Confidence Intervals. The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals. Jost’s D is also given ;)fst_WC84
: bug fix, the function was not properly configured for multi-allelic markers (e.g. microsatellite, and haplotype format from STACKS). Thanks to Craig McDougall for catching this.assignment_mixture
: added a check to throw an error when pop.levels != the pop.id in strataassignment_mixture
:
stackr
.fst_WC84
tidyr::spread
and tidyr::gather
for data.table::dcast.data.table
and data.table::melt.data.table
to make the code faster, I forgot to split genotype into alleles for gsi_sim
.you need to update [stackr] (https://github.com/thierrygosselin/stackr) to v.0.2.7 to appreciate this new version of assigner.
updated assignment_ngs
with the separate stackr modules to simplify the function.
new data file available for assignment_ngs
: genepop
and genind
object.
assignment_ngs
now accept any vcf input file! i.e. it’s no longer limited to stacks vcf.
new arguments in assignment_ngs
. The assignment using dapc can now use the optimized alpha score adegenet.dapc.opt == "optim.a.score"
or the cross-validation adegenet.dapc.opt == "xval"
. This is useful for fine tuning the trade-off between power of discrimination and over-fitting (for stability of group membership probabilities). Cross validation with adegenet.dapc.opt == "xval"
doesn’t work with missing data, so it’s only available with imputed data (i.e. imputation.method == "rf" or "max"
). With non imputed data or the default: the optimized alpha-score is used (adegenet.dapc.opt == "optim.a.score"
). When using adegenet.dapc.opt == "xval"
, 2 new arguments are available:
(1) adegenet.n.rep
and (2) adegenet.training
. See documentation for details.
removed arguments in assignment_ngs
. Removed the pop.id.start
and pop.id.end
arguments that were confusing people. For those used to these arguments, they are now recycled in the new function individuals2strata
in [stackr] (https://github.com/thierrygosselin/stackr). The strata file created by this function can be used with the strata
argument in assignment_ngs
.
2 modified arguments in assignment_ngs
: (1) gsi_sim.filename
is now filename
; and
(2) if you didn’t use the imputation argument, replace imputation.method = FALSE
to imputation.method = NULL
or leave the argument missing.
simplified sections of codes in assignment_ngs
that dealt with strata
, pop.levels
and pop.labels
.
new function: write_gsi_sim
. Write a gsi_sim file from a data frame (wide or long/tidy). Used internally in [assigner] (https://github.com/thierrygosselin/assigner) and might be of interest for users.
NEWS.md
file to track changes to the package.fst_WC84
is now a separate and very fast function that can compute: the overall and pairwise Weir and Cockerham 1984 Theta/Fst. Bootstrap resampling of markers is avalaible to build Confidence Intervals (For Louis Bernatchez and his students;). The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals.assignment_ngs
+ assignment.analysis = "adegenet"
+ sampling.method = "ranked"
. A line at the beginning of a gsi_sim code section was deleted makig the assignment with adegenet go through that chunk of code and causing 100% assignment! if (assignment.analysis = “gsi_sim”) {code} prevent this problem…import_subsamples_fst
to import the fst ranking results from all the subsample runs inside an assignment folder.assignment_mixture
with sampling.method = "ranked"
and assignment.analysis = "adegenet"
.imputations
is now impute.method
.impute
with 2 options: impute = "genotype"
or impute = "allele"
.data
and covers the three types of files the function can use: VCF file, PLINK tped/tfam or data frame of genotypes file.tfam
file will be used for the strata
argument, unless a new one is provided. Columns 1, 3 and 4 of the tped
are discarded. The remaining columns correspond to the genotype in the format 01/04
where A = 01, C = 02, G = 03 and T = 04
. For A/T
format, use PLINK or bash to convert. Use [VCFTOOLS] (http://vcftools.sourceforge.net/) with --plink-tped
to convert very large VCF file. For .ped
file conversion to .tped
use [PLINK] (http://pngu.mgh.harvard.edu/~purcell/plink/) with --recode transpose
.GBS_assignment
to assignment_ngs
. Stands for assignment with next-generation sequencing data.df.file
if you don’t have a VCF file. See documentation.strata
if you don’t have population id or other metadata info in the individual name. See documentation.THL
to thl
and snp.LD
to snp.ld
to follow convention.iterations.subsample
changed to iteration.subsample
.iterations
changed to iteration.method
to avoid confusion with other iteration arguments.baseline
and mixture
arguments from the function GBS_assignment
. These options will be re-introduce later in a separate function.marker.number
higher than the number of markers in the data set was causing problems. This could arise when using arguments that removed markers from the dataset (e.g. snp.ld
, common.markers
, and maf
filters).sudo rm /usr/local/bin/gsisim