Calibrate REF and ALT alleles based on counts. The REF allele is designated as the allele with more counts in the dataset. The function will generate a REF and ALT columns.
reference genome: for people using a reference genome, the reference allele terminology is different and is not based on counts...
Used internally in radiator and might be of interest for users.
calibrate_alleles(
data,
biallelic = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = FALSE,
...
)
A genomic data set in the global environment tidy formats. See details for more info.
(optional) If biallelic = TRUE/FALSE
will be use
during multiallelic REF/ALT decision and speed up computations.
Default: biallelic = NULL
.
(optional) The number of core used for parallel
execution. This is no longer used. The code is as fast as it can. Using
more cores will reduce the speed.
Default: parallel.core = parallel::detectCores() - 1
.
(optional, logical) verbose = TRUE
to be chatty
during execution.
Default: verbose = FALSE
.
(optional) To pass further argument for fine-tuning the tidying (details below).
Depending if the input file is biallelic or multiallelic, the function will output additional to REF and ALT column several genotype codings:
GT
: the genotype in 6 digits format with integers.
GT_VCF
: the genotype in VCF format with integers.
GT_VCF_NUC
: the genotype in VCF format with letters corresponding to nucleotide.
GT_BIN
: biallelic coding similar to PLINK,
the coding 0, 1, 2, NA
correspond to the number of ALT allele in the
genotype and NA
for missing genotypes.
Input data: A minimum of 4 columns are required (the rest are considered metata info):
MARKERS
POP_ID
INDIVIDUALS
GT
and/or GT_VCF_NUC
and/or GT_VCF
How to get a tidy data frame ?
radiator tidy_genomic_data