Used internally in radiator and might be of interest for users. The function allows to extract DArT target id from a DArT file. To help prepare the appropriate STRATA file.
extract_dart_target_id(data, write = TRUE)
One of the DArT output files. 6 formats used by DArT are recognized by radiator. recognised:
1row
: Genotypes are in 1 row and coded (0, 1, 2, -).
0 for 2 reference alleles REF/REF
, 1 for 2 alternate alleles ALT/ALT
,
2 for heterozygote REF/ALT
, - for missing
.
2rows
: No genotypes. It's absence/presence, 0/1, of the REF and ALT alleles.
Sometimes called binary format.
counts
: No genotypes, It's counts/read depth for the REF and ALT alleles.
Sometimes just called count data.
silico.dart
: SilicoDArT data. No genotypes, no REF or ALT alleles.
It's a file coded as absence/presence, 0/1, for the presence of sequence in
the clone id.
silico.dart.counts
: SilicoDArT data. No genotypes, no REF or ALT alleles.
It's a file coded as absence/presence, with counts for the presence of sequence in
the clone id.
dart.vcf
: For DArT VCFs, please use read_vcf
.
Depending on the number of markers, these format will be recoded similarly to VCF files (dosage of alternate allele, see details).
The function can import .csv
or .tsv
files.
If you encounter a problem, sent me your data so that I can update the function.
With default write = TRUE
, the dart target id column is
written in a file in the working directory.
A tidy dataframe with a TARGET_ID
column. For cleaning, the TARGET_ID
column is treated like the column INDIVIDUALS
. Spaces and ,
are removed, _
and :
are changed to a dash -
and
UPPER case is used.
see cleaning doc for logic behind this.