Calibrate REF and ALT alleles based on count — calibrate

Calibrate REF and ALT alleles based on counts. The REF allele is designated as the allele with more counts in the dataset. The function will generate a REF and ALT columns.

reference genome: for people using a reference genome, the reference allele terminology is different and is not based on counts...

Used internally in radiator and might be of interest for users.

calibrate_alleles(
  data,
  biallelic = NULL,
  parallel.core = parallel::detectCores() - 1,
  verbose = FALSE,
  ...
)

Arguments

data: A genomic data set in the global environment tidy formats. See details for more info.
biallelic: (optional) If biallelic = TRUE/FALSE will be use during multiallelic REF/ALT decision and speed up computations. Default: biallelic = NULL.
parallel.core: (optional) The number of core used for parallel execution. This is no longer used. The code is as fast as it can. Using more cores will reduce the speed. Default: parallel.core = parallel::detectCores() - 1.
verbose: (optional, logical) verbose = TRUE to be chatty during execution. Default: verbose = FALSE.
...: (optional) To pass further argument for fine-tuning the tidying (details below).

Value

Depending if the input file is biallelic or multiallelic, the function will output additional to REF and ALT column several genotype codings:

GT: the genotype in 6 digits format with integers.
GT_VCF: the genotype in VCF format with integers.
GT_VCF_NUC: the genotype in VCF format with letters corresponding to nucleotide.
GT_BIN: biallelic coding similar to PLINK, the coding 0, 1, 2, NA correspond to the number of ALT allele in the genotype and NA for missing genotypes.

Details

Input data: A minimum of 4 columns are required (the rest are considered metata info):

MARKERS
POP_ID
INDIVIDUALS
GT and/or GT_VCF_NUC and/or GT_VCF

How to get a tidy data frame ? radiator tidy_genomic_data

Author

Thierry Gosselin thierrygosselin@icloud.com