R/memorize_missing.R
memorize_missing.Rd
Use this function to keep the pattern of missing data (0/1). The pattern can be randomized based on dataset attributes/covariates. This can be useful to generate missingness on simulated dataset with the same number of individuals, populations and markers or to analyze the accuracy of imputation algorithms. A vignette is under construction to leverage this function.
memorize_missing(data, strata = NULL, randomize = NULL, filename = NULL)
data | 14 options for input (diploid data only): VCFs (SNPs or Haplotypes,
to make the vcf population ready),
plink (tped, bed), stacks haplotype file, genind (library(adegenet)),
genlight (library(adegenet)), gtypes (library(strataG)), genepop, DArT,
and a data frame in long/tidy or wide format. To verify that radiator detect
your file format use DArT and VCF data: radiator was not meant to generate alleles and genotypes if you are using a VCF file with no genotype (only genotype likelihood: GL or PL). Neither is radiator able to magically generate a genind object from a SilicoDArT dataset. Please look at the first few lines of your dataset to understand it's limit before asking raditor to convert or filter your dataset. |
---|---|
strata | (optional/required) Required for VCF and haplotypes files,
optional for the other formats supported.
See documentation of |
randomize | (optional, string) To randomize the missingness of specific attributes.
Available options: |
filename | (optional) The name of the file (extension not necessary)
written to the working directory and containing the missing info.
Default:
|
A tidy dataframe in the global environment with columns:
POP_ID
, INDIVIDUALS
, MARKERS
, and in the subsequent
columns, the missingness info coded 0 for missing and 1 for genotyped.
Depending on the value chosen for the argument randomize
,
the columns are:
MISSING_ORIGINAL
: for the original missing pattern (always present)
MISSING_MARKERS_MIX
: for the missing pattern randomized by markers (optional)
MISSING_POP_MIX
: for the missing pattern randomized by populations (optional)
MISSING_INDIVIDUALS_MIX
: for the missing pattern randomized by individuals (optional)
MISSING_OVERALL_MIX
: for the missing pattern randomized overall (optional)
Thierry Gosselin thierrygosselin@icloud.com
if (FALSE) { missing.memory <- memorize_missing( data = "batch_1.vcf", strata = "population.map.strata.tsv", randomize = "populations", filename = "missing.memory.panda" ) }