Used internally in radiator and might be of interest for some users. The function allows to extract DArT target id from a DArT file. To help prepare the appropriate STRATA file. You can also decide if you want the samples metadata.
extract_dart_target_id(data, write = TRUE, metadata = FALSE)
(file) 6 files formats used by DArT are recognized by radiator.
Don't modify the DArT file, to do this, use the strata
file/argument below.
The function can import files ending with .csv
or .tsv
.
1row: Genotypes are in 1 row and coded (0, 1, 2, -).
0 for 2 reference alleles REF/REF
, 1 for 2 alternate alleles ALT/ALT
,
2 for heterozygote REF/ALT
, - for missing
.
2rows: No genotypes. It's absence/presence, 0/1, of the REF and ALT alleles. Sometimes called binary format.
counts: No genotypes, It's counts/read depth for the REF and ALT alleles. Sometimes just called count data. This should be the preferred file format, because DArT output the coverage (read depth for each genotypes).
silico.dart: SilicoDArT data. No genotypes, no REF or ALT alleles. It's a file coded as absence/presence, 0/1, for the presence of sequence in the clone id.
silico.dart.counts: SilicoDArT data. No genotypes, no REF or ALT alleles. It's a file coded as absence/presence, with counts for the presence of sequence in the clone id.
dart.vcf: For DArT VCFs, please use read_vcf
.
If you encounter a problem, sent me your data so that I can update the function.
(logical) With default write = TRUE
, the dart target id column is
written in a file in the working directory.
(logical) With default metadata = FALSE
, the dart
target id and the sample metadata are extracted from the dart file and converted
into a tidy data frame.
DART_NUMBER
: The DArT order or service number
DART_PLATE_BARCODE
: The DArT plate barcode
CLIENT_BARCODE
: The client plate barcode if provided
WELL_ROW
: The well row position (A, B, C, D, E, F, G, H)
WELL_COL
: The well column position (1 to 12)
SAMPLE_COMMENTS
: The client sample comment if provided
TARGET_ID
: Depending on the DArT file type, the target id
generated by DArT or sample info provided by the client.
IMPORTANT NOTE:
DArT is not consistent with the output = always
verify the columns.
A tidy dataframe with a TARGET_ID
column and metadata if requested.
For cleaning, the TARGET_ID
column is treated like the column INDIVIDUALS
. Spaces and ,
are removed, _
and :
are changed to a dash -
and
UPPER case is used.
see cleaning doc for logic behind this.