Used internally in radiator and might be of interest for some users. The function allows to extract DArT target id from a DArT file. To help prepare the appropriate STRATA file. You can also decide if you want the samples metadata.

extract_dart_target_id(data, write = TRUE, metadata = FALSE)

Arguments

data

(file) 6 files formats used by DArT are recognized by radiator. Don't modify the DArT file, to do this, use the strata file/argument below. The function can import files ending with .csv or .tsv.

  1. 1row: Genotypes are in 1 row and coded (0, 1, 2, -). 0 for 2 reference alleles REF/REF, 1 for 2 alternate alleles ALT/ALT, 2 for heterozygote REF/ALT, - for missing.

  2. 2rows: No genotypes. It's absence/presence, 0/1, of the REF and ALT alleles. Sometimes called binary format.

  3. counts: No genotypes, It's counts/read depth for the REF and ALT alleles. Sometimes just called count data. This should be the preferred file format, because DArT output the coverage (read depth for each genotypes).

  4. silico.dart: SilicoDArT data. No genotypes, no REF or ALT alleles. It's a file coded as absence/presence, 0/1, for the presence of sequence in the clone id.

  5. silico.dart.counts: SilicoDArT data. No genotypes, no REF or ALT alleles. It's a file coded as absence/presence, with counts for the presence of sequence in the clone id.

  6. dart.vcf: For DArT VCFs, please use read_vcf.

If you encounter a problem, sent me your data so that I can update the function.

write

(logical) With default write = TRUE, the dart target id column is written in a file in the working directory.

metadata

(logical) With default metadata = FALSE, the dart target id and the sample metadata are extracted from the dart file and converted into a tidy data frame.

  • DART_NUMBER: The DArT order or service number

  • DART_PLATE_BARCODE: The DArT plate barcode

  • CLIENT_BARCODE: The client plate barcode if provided

  • WELL_ROW: The well row position (A, B, C, D, E, F, G, H)

  • WELL_COL: The well column position (1 to 12)

  • SAMPLE_COMMENTS: The client sample comment if provided

  • TARGET_ID: Depending on the DArT file type, the target id generated by DArT or sample info provided by the client.

  • IMPORTANT NOTE: DArT is not consistent with the output = always verify the columns.

Value

A tidy dataframe with a TARGET_ID column and metadata if requested. For cleaning, the TARGET_ID column is treated like the column INDIVIDUALS. Spaces and , are removed, _ and : are changed to a dash - and UPPER case is used. see cleaning doc for logic behind this.

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) { # \dontrun{
# Built a strata file:
strata <- radiator::extract_dart_target_id("mt.dart.file.csv") %>%
    dplyr::mutate(
        INDIVIDUALS = "new id you want to give",
        STRATA = "fill this"
    ) %>%
    readr::write_tsv(x = ., file = "my.new.dart.strata.tsv")
} # }