Write tidy genomic data file or close GDS file

For tidy genomic datasets, the function provides a fast way to write a .arrow.parquet file from Apache. This new file ending replaces .rad file format that was essentially the .fst, provided by the package fst. See explanation in section below.

When the object is a CoreArray Genomic Data Structure (GDS) file system, the function close the connection with the GDS file. Before doing so it sets the filters (variants and samples) based on the info found in the file.

Used internally in radiator and assigner and might be of interest for users.

write_rad(
  data,
  filename,
  internal = FALSE,
  write.message = "standard",
  verbose = FALSE
)

Arguments

data: An object in the global environment: tidy genomic dataset or GDS connection file
filename: (optional) Name of the file. If default, radiator_date_time.arrow.parquet is used. Default: filename = NULL.
internal: (optional, logical) This is used inside radiator internal code and it stops from writing the file. Default: internal = FALSE.
write.message: (optional, character) Print a message in the console after writing file. With write.message = NULL, nothing is printed in the console. Default: write.message = "standard". This will print message("File written: ", basename(filename)).
verbose: (optional, logical) verbose = TRUE to be chatty during execution. Default: verbose = FALSE.

Value

A file written in the working directory or nothing if it's a GDS connection file.

Why is .rad is no longer supported

Originally, the ending .fst from package fst was replaced by .rad to remove the confusion with population genetics statistic fst ... The decision to stop using the package fst was taken because:

The package was always difficult to install when you wanted all cores to function.
Successful recipe to install with one OS was rarely working after changing R version or OS version = painful.
Asking users to play with .R/Makevars was always time consuming to troubleshoot after for me.
arrow is easy to install, files are smaller in size and read/write faster!

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) { # \dontrun{
require(SeqArray)
radiator::write_rad(data = tidy.data, filename = "data.shark.arrow.parquet")
radiator::write_rad(data = gds.object)
} # }