Get Started

Learning radiator

To see if radiator is the right tool for you, you can start with the basics.

1. Prepare a strata file

It’s a tab separated file, e.g. example.strata.tsv.
A minimum of 2 columns: INDIVIDUALS and STRATA is required.
The STRATA column identifies the individuals stratification, the hierarchical groupings: populations, sampling sites or any grouping you want.
It’s like stacks population map file with header…
DArT users: the strata requires 3 columns and is described in ??radiator::readr_dart example.dart.strata.tsv.

To make sure it’s going to work properly, try:

radiator::summary_strata("my.strata.tsv")
# more details in with `??radiator::summary_strata`

2. Filter your RADseq data

filter_rad is the ONE FUNCTION TO RULE THEM ALL.
There’s a built-in interactive mode that requires users to visualize the data before choosing thresholds.
The function is made of filtering modules that user’s can also access separately or in combination.
For 95% of users, filter_rad will be enough to start exploring the biology!

data <- radiator::filter_rad(
    data = "my.vcf",
    strata = "my.strata.tsv", 
    output = c("genind", "stockr")
    )

To cite radiator, inside R type citation("radiator")

Overview

Caracteristics	Description
Import	List of the 14 supported input genomic file formats (diploid data only) and their variations: VCF: SNPs and haplotypes (Danecek et al., 2011) DArT files (5): genotypes in 1row, alleles counts and coverage in 2 rows, SilicoDArT genotypes and counts PLINK: bed/tped/tfam (Purcell et al., 2007) genind (Jombart et al., 2010; Jombart and Ahmed, 2011) genlight (Jombart et al., 2010; Jombart and Ahmed, 2011) strataG gtypes (Archer et al., 2016) Genepop (Raymond and Rousset, 1995; Rousset, 2008) STACKS haplotype file (Catchen et al., 2011, 2013) hierfstat (Goudet, 2005) SeqArray (Zheng et al., 2017) SNPRelate (Zheng et al., 2012) Dataframes of genotypes in wide or long/tidy format Reading and tidying is found inside: `genomic_converter`, `tidy_` and `read_` functions
Output	29 genomic data formats can be exported out of radiator. The module responsible for this is `genomic_converter`. Separate modules handles the different formats and are also available: `write_vcf`: VCF (Danecek et al., 2011) `write_plink`: PLINK tped/tfam (Purcell et al., 2007) `write_genind`: adegenet genind and genlight (Jombart et al., 2010; Jombart and Ahmed, 2011) `write_genlight`: genlight (Jombart et al., 2010; Jombart and Ahmed, 2011) `write_gsi_sim`: gsi_sim (Anderson et al. 2008) `write_rubias`: rubias (Moran and Anderson, 2018) `write_gtypes`: strataG gtypes (Archer et al. 2016) `write_colony`: COLONY (Jones and Wang, 2010; Wang, 2012) `write_genepop`: Genepop (Raymond and Rousset, 1995; Rousset, 2008) `write_genepopedit`: genepopedit (Stanley et al., 2017) STACKS haplotype file (Catchen et al., 2011, 2013) `write_betadiv`: betadiv (Lamy, 2015) `write_dadi`: δaδi (Gutenkunst et al., 2009) `write_structure`: structure (Pritchard et al., 2000) `write_faststructure`: faststructure (Raj & Pritchard, 2014) `write_arlequin`: Arlequin (Excoffier et al. 2005) `write_hierfstat`: hierfstat (Goudet, 2005) `write_snprelate`: SNPRelate (Zheng et al. 2012) `write_seqarray`: SeqArray (Zheng et al. 2017) `write_bayescan`: BayeScan (Foll and Gaggiotti, 2008) `write_pcadapt`: pcadapt (Luu et al. 2017) `write_hzar` (Derryberry et al. 2013) `write_fineradstructure` (Malinsky et al., 2018) `write_related` related (Pew et al., 2015) `write_stockr` for stockR package (Foster el al., submitted) `write_maverick` MavericK (Verity & Nichols, 2016) `write_ldna` LDna (Kemppainen et al. 2015) `write_hapmap` HapMap Dataframes of genotypes in wide or long/tidy format
Conversion function	`genomic_converter` import/export genomic formats mentioned above. The function is also integrated with usefull filters, blacklist and whitelist.
Outliers detection	`detect_duplicate_genomes`: detect and remove duplicate individuals from your dataset `detect_mixed_genomes`: detect and remove potentially mixed individuals `stackr::summary_haplotype` and `filter_snp_number`: Discard of outlier markers with de novo assembly artifact (e.g. markers with an extreme number of SNP per haplotype or with irregular number of alleles)
Filters	Targets of filters: alleles, genotypes, markers, individuals and populations and associated metrics and statistics can be filtered and/or selected in several ways inside the main filtering function `filter_rad` and/or the underlying modules: `filter_rad`: designed for RADseq data, it’s the one function to rule them all. Best used with unfiltered or very low filtered VCF (or listed input) file. The function can handle very large VCF files (e.g. no problem with >2M SNPs, > 30GB files), all within R!! `filter_dart_reproducibility`: blaclist markers under a certain threshold of DArT reproducibility metric. `filter_monomorphic`: blacklist markers with only 1 morph. `filter_common_markers`: keep only markers common between strata. `filter_individuals`: blacklist individuals based on missingness, heterozygosity and/or total coverage. `filter_mac`: blacklist markers based on minor/alternate allele count. `filter_coverage`: blacklist markers based on mean read depth (coverage). `filter_genotype_likelihood`: Discard markers based on genotype likelihood `filter_genotyping`: blacklist markers based on genotyping/call rate. `filter_snp_position_read`: blacklist markers based based on the SNP position on the read/locus. `filter_snp_number`: blacklist locus with too many SNPs. `filter_ld`: blacklist markers based on short and/or long distance linkage disequilibrium. `filter_hwe`: blacklist markers based on Hardy-Weinberg Equilibrium expectations (HWE). `filter_het`: blacklist markers based on the observed heterozygosity (Het obs). `filter_fis`: blacklist markers based on the inbreeding coefficient (Fis). `filter_whitelist`: keep only markers present in a whitelist
ggplot2-based plotting	Visualize distribution of important metric and statistics and create publication-ready figures
Parallel	Codes designed and optimized for fast computations using Genomic Data Structure GDS file format and data science packages in tiverse. Works with all OS: Linux, Mac and now PC!

Thierry Gosselin

2025-07-04

Learning radiator

1. Prepare a strata file

2. Filter your RADseq data

Overview