Read/Import and tidy genomic data frames. If data is in wide format, the functions will gather the data. Used internally in radiator and assigner and might be of interest for users.
tidy_wide(data, import.metadata = FALSE)
A file in the working directory or object in the global environment in wide or long (tidy) formats. See details for more info.
How to get a tidy data frame ?
radiator
tidy_genomic_data
.
(optional, logical) With import.metadata = TRUE
the metadata (anything else than the genotype) will be imported for the long
format exclusively. Default: import.metadata = FALSE
, no metadata.
A tidy data frame in the global environment.
Input data:
To discriminate the long from the wide format,
the function radiator tidy_wide
searches
for MARKERS
in column names (TRUE = long format).
The data frame is tab delimitted.
Wide format:
The wide format cannot store metadata info.
The wide format starts with these 2 id columns:
INDIVIDUALS
, STRATA
(that refers to any grouping of individuals),
the remaining columns are the markers in separate columns storing genotypes.
Long/Tidy format:
The long format is considered to be a tidy data frame and can store metadata info.
(e.g. from a VCF see radiator tidy_genomic_data
).
A minimum of 4 columns
are required in the long format: INDIVIDUALS
, STRATA
,
MARKERS
and GT
for the genotypes.
The remaining columns are considered metadata info.
Genotypes with separators:
ALL separators will be removed.
Genotypes should be coded with 3 integers for each alleles.
6 integers in total for the genotypes.
e.g. 001002 or 111333
(for heterozygote individual).
6 integers WITH separator: e.g. 001/002 or 111/333
(for heterozygote individual).
The separator can be any of these: "/", ":", "_", "-", "."
, and will
be removed.
separators in STRATA, INDIVIDUALS and MARKERS: Some separators can interfere with packages or codes and are cleaned by radiator.
MARKERS: /
, :
, -
and .
are changed to an
underscore
_
.
STRATA: white spaces in population names are replaced by underscore.
INDIVIDUALS: _
and :
are changed to a dash -
How to get a tidy data frame ?
radiator tidy_genomic_data
can transform 6 genomic data formats
in a tidy data frame.