Learning radiator

To see if radiator is the right tool for you, you can start with the basics.

1. Prepare a strata file

  • It’s a tab separated file, e.g. example.strata.tsv.
  • A minimum of 2 columns: INDIVIDUALS and STRATA is required.
  • The STRATA column identifies the individuals stratification, the hierarchical groupings: populations, sampling sites or any grouping you want.
  • It’s like stacks population map file with header…
  • DArT users: the strata requires 3 columns and is described in ??radiator::readr_dart example.dart.strata.tsv.

To make sure it’s going to work properly, try:

# more details in with `??radiator::summary_strata`

2. Filter your RADseq data

  • filter_rad is the ONE FUNCTION TO RULE THEM ALL.
  • There’s a built-in interactive mode that requires users to visualize the data before choosing thresholds.
  • The function is made of filtering modules that user’s can also access separately or in combination.
  • For 95% of users, filter_rad will be enough to start exploring the biology!
data <- radiator::filter_rad(
    data = "my.vcf",
    strata = "my.strata.tsv", 
    output = c("genind", "stockr")

To cite radiator, inside R type citation("radiator")


Caracteristics Description
Import List of the 14 supported input genomic file formats (diploid data only) and their variations:
VCF: SNPs and haplotypes (Danecek et al., 2011)
DArT files (5): genotypes in 1row, alleles counts and coverage in 2 rows, SilicoDArT genotypes and counts
PLINK: bed/tped/tfam (Purcell et al., 2007)
genind (Jombart et al., 2010; Jombart and Ahmed, 2011)
genlight (Jombart et al., 2010; Jombart and Ahmed, 2011)
strataG gtypes (Archer et al., 2016)
Genepop (Raymond and Rousset, 1995; Rousset, 2008)
STACKS haplotype file (Catchen et al., 2011, 2013)
hierfstat (Goudet, 2005)
SeqArray (Zheng et al., 2017)
SNPRelate (Zheng et al., 2012)
Dataframes of genotypes in wide or long/tidy format

Reading and tidying is found inside: genomic_converter, tidy_ and read_ functions
Output 29 genomic data formats can be exported out of radiator. The module responsible for this is genomic_converter. Separate modules handles the different formats and are also available:
write_vcf: VCF (Danecek et al., 2011)
write_plink: PLINK tped/tfam (Purcell et al., 2007)
write_genind: adegenet genind and genlight (Jombart et al., 2010; Jombart and Ahmed, 2011)
write_genlight: genlight (Jombart et al., 2010; Jombart and Ahmed, 2011)
write_gsi_sim: gsi_sim (Anderson et al. 2008)
write_rubias: rubias (Moran and Anderson, 2018)
write_gtypes: strataG gtypes (Archer et al. 2016)
write_colony: COLONY (Jones and Wang, 2010; Wang, 2012)
write_genepop: Genepop (Raymond and Rousset, 1995; Rousset, 2008)
write_genepopedit: genepopedit (Stanley et al., 2017)
STACKS haplotype file (Catchen et al., 2011, 2013)
write_betadiv: betadiv (Lamy, 2015)
write_dadi: δaδi (Gutenkunst et al., 2009)
write_structure: structure (Pritchard et al., 2000)
write_faststructure: faststructure (Raj & Pritchard, 2014)
write_arlequin: Arlequin (Excoffier et al. 2005)
write_hierfstat: hierfstat (Goudet, 2005)
write_snprelate: SNPRelate (Zheng et al. 2012)
write_seqarray: SeqArray (Zheng et al. 2017)
write_bayescan: BayeScan (Foll and Gaggiotti, 2008)
write_pcadapt: pcadapt (Luu et al. 2017)
write_hzar (Derryberry et al. 2013)
write_fineradstructure (Malinsky et al., 2018)
write_related related (Pew et al., 2015)
write_stockr for stockR package (Foster el al., submitted)
write_maverick MavericK (Verity & Nichols, 2016)
write_ldna LDna (Kemppainen et al. 2015)
write_hapmap HapMap
Dataframes of genotypes in wide or long/tidy format
Conversion function genomic_converter import/export genomic formats mentioned above. The function is also integrated with usefull filters, blacklist and whitelist.
Outliers detection detect_duplicate_genomes: detect and remove duplicate individuals from your dataset
detect_mixed_genomes: detect and remove potentially mixed individuals
stackr::summary_haplotype and filter_snp_number: Discard of outlier markers with de novo assembly artifact (e.g. markers with an extreme number of SNP per haplotype or with irregular number of alleles)
Filters Targets of filters: alleles, genotypes, markers, individuals and populations and associated metrics and statistics can be filtered and/or selected in several ways inside the main filtering function filter_rad and/or the underlying modules:

filter_rad: designed for RADseq data, it’s the one function to rule them all. Best used with unfiltered or very low filtered VCF (or listed input) file. The function can handle very large VCF files (e.g. no problem with >2M SNPs, > 30GB files), all within R!!
filter_dart_reproducibility: blaclist markers under a certain threshold of DArT reproducibility metric.
filter_monomorphic: blacklist markers with only 1 morph.
filter_common_markers: keep only markers common between strata.
filter_individuals: blacklist individuals based on missingness, heterozygosity and/or total coverage.
filter_mac: blacklist markers based on minor/alternate allele count.
filter_coverage: blacklist markers based on mean read depth (coverage).
filter_genotype_likelihood: Discard markers based on genotype likelihood
filter_genotyping: blacklist markers based on genotyping/call rate.
filter_snp_position_read: blacklist markers based based on the SNP position on the read/locus.
filter_snp_number: blacklist locus with too many SNPs.
filter_ld: blacklist markers based on short and/or long distance linkage disequilibrium.
filter_hwe: blacklist markers based on Hardy-Weinberg Equilibrium expectations (HWE).
filter_het: blacklist markers based on the observed heterozygosity (Het obs).
filter_fis: blacklist markers based on the inbreeding coefficient (Fis).
filter_whitelist: keep only markers present in a whitelist
ggplot2-based plotting Visualize distribution of important metric and statistics and create publication-ready figures
Parallel Codes designed and optimized for fast computations using Genomic Data Structure GDS file format and data science packages in tiverse. Works with all OS: Linux, Mac and now PC!