To see if radiator is the right tool for you, you can start with the basics.
INDIVIDUALS
and
STRATA
is required.STRATA
column identifies the individuals
stratification, the hierarchical groupings: populations, sampling sites
or any grouping you want.??radiator::readr_dart
example.dart.strata.tsv.To make sure it’s going to work properly, try:
radiator::summary_strata("my.strata.tsv")
# more details in with `??radiator::summary_strata`
filter_rad
is the ONE FUNCTION TO RULE THEM ALL.filter_rad
will be enough to start
exploring the biology!
data <- radiator::filter_rad(
data = "my.vcf",
strata = "my.strata.tsv",
output = c("genind", "stockr")
)
To cite radiator, inside R type
citation("radiator")
Caracteristics | Description |
---|---|
Import | List of the 14 supported input genomic file formats
(diploid data only) and their variations: VCF: SNPs and haplotypes (Danecek et al., 2011) DArT files (5): genotypes in 1row, alleles counts and coverage in 2 rows, SilicoDArT genotypes and counts PLINK: bed/tped/tfam (Purcell et al., 2007) genind (Jombart et al., 2010; Jombart and Ahmed, 2011) genlight (Jombart et al., 2010; Jombart and Ahmed, 2011) strataG gtypes (Archer et al., 2016) Genepop (Raymond and Rousset, 1995; Rousset, 2008) STACKS haplotype file (Catchen et al., 2011, 2013) hierfstat (Goudet, 2005) SeqArray (Zheng et al., 2017) SNPRelate (Zheng et al., 2012) Dataframes of genotypes in wide or long/tidy format Reading and tidying is found inside: genomic_converter , tidy_ and
read_ functions
|
Output | 29 genomic data formats can be exported out of
radiator. The module responsible for this is
genomic_converter . Separate modules handles the different
formats and are also available:write_vcf : VCF (Danecek et al.,
2011)write_plink : PLINK tped/tfam (Purcell
et al., 2007)write_genind : adegenet genind and
genlight (Jombart et al., 2010; Jombart and Ahmed,
2011)write_genlight : genlight (Jombart
et al., 2010; Jombart and Ahmed, 2011)write_gsi_sim : gsi_sim (Anderson et
al. 2008)write_rubias : rubias (Moran and
Anderson, 2018)write_gtypes : strataG gtypes (Archer
et al. 2016)write_colony : COLONY (Jones and
Wang, 2010; Wang, 2012)write_genepop : Genepop (Raymond and Rousset,
1995; Rousset, 2008)write_genepopedit : genepopedit (Stanley
et al., 2017)STACKS haplotype file (Catchen et al., 2011, 2013) write_betadiv : betadiv
(Lamy, 2015)write_dadi : δaδi (Gutenkunst
et al., 2009)write_structure : structure
(Pritchard et al., 2000)write_faststructure : faststructure (Raj
& Pritchard, 2014)write_arlequin : Arlequin (Excoffier
et al. 2005)write_hierfstat : hierfstat (Goudet,
2005)write_snprelate : SNPRelate (Zheng et
al. 2012)write_seqarray : SeqArray (Zheng et
al. 2017)write_bayescan : BayeScan (Foll and
Gaggiotti, 2008)write_pcadapt : pcadapt (Luu et
al. 2017)write_hzar (Derryberry et al. 2013)
write_fineradstructure (Malinsky et al., 2018)
write_related related (Pew et
al., 2015)write_stockr for stockR package (Foster el
al., submitted)write_maverick MavericK
(Verity & Nichols, 2016)write_ldna LDna (Kemppainen et
al. 2015)write_hapmap HapMapDataframes of genotypes in wide or long/tidy format |
Conversion function |
genomic_converter import/export genomic
formats mentioned above. The function is also integrated with usefull
filters, blacklist and whitelist. |
Outliers detection |
detect_duplicate_genomes : detect and
remove duplicate individuals from your dataset
detect_mixed_genomes : detect and remove potentially
mixed individualsstackr::summary_haplotype and
filter_snp_number : Discard of outlier markers with de
novo assembly artifact (e.g. markers with an extreme number of SNP
per haplotype or with irregular number of alleles) |
Filters | Targets of filters: alleles, genotypes, markers,
individuals and populations and associated metrics and statistics can be
filtered and/or selected in several ways inside the main filtering
function filter_rad and/or the underlying
modules:filter_rad : designed for RADseq data, it’s
the one function to rule them all. Best used with unfiltered or
very low filtered VCF (or listed input) file. The function can handle
very large VCF files (e.g. no problem with >2M SNPs, > 30GB
files), all within R!!filter_dart_reproducibility :
blaclist markers under a certain threshold of DArT reproducibility
metric.filter_monomorphic : blacklist markers with only
1 morph.filter_common_markers : keep only markers common
between strata.filter_individuals : blacklist
individuals based on missingness, heterozygosity and/or total
coverage.filter_mac : blacklist markers based on
minor/alternate allele count.filter_coverage : blacklist
markers based on mean read depth
(coverage).filter_genotype_likelihood : Discard markers
based on genotype likelihoodfilter_genotyping :
blacklist markers based on genotyping/call
rate.filter_snp_position_read : blacklist markers based
based on the SNP position on the
read/locus.filter_snp_number : blacklist locus with too
many SNPs.filter_ld : blacklist markers based on short
and/or long distance linkage disequilibrium.filter_hwe :
blacklist markers based on Hardy-Weinberg Equilibrium expectations
(HWE).filter_het : blacklist markers based on the
observed heterozygosity (Het obs).filter_fis : blacklist
markers based on the inbreeding coefficient
(Fis).filter_whitelist : keep only markers present in a
whitelist |
ggplot2-based plotting | Visualize distribution of important metric and statistics and create publication-ready figures |
Parallel | Codes designed and optimized for fast computations using Genomic Data Structure GDS file format and data science packages in tiverse. Works with all OS: Linux, Mac and now PC! |