For tidy genomic datasets, the function provides a
fast way to write a .arrow.parquet
file from Apache.
This new file ending replaces .rad
file format that was essentially
the .fst
, provided by
the package fst.
See explanation in section below.
When the object is a CoreArray Genomic Data Structure (GDS) file system, the function close the connection with the GDS file. Before doing so it sets the filters (variants and samples) based on the info found in the file.
Used internally in radiator and assigner and might be of interest for users.
write_rad(
data,
filename,
internal = FALSE,
write.message = "standard",
verbose = FALSE
)
An object in the global environment: tidy genomic dataset or GDS connection file
(optional) Name of the file. If default, radiator_date_time.arrow.parquet
is used.
Default: filename = NULL
.
(optional, logical) This is used inside radiator internal code and it stops
from writing the file.
Default: internal = FALSE
.
(optional, character) Print a message in the console
after writing file.
With write.message = NULL
, nothing is printed in the console.
Default: write.message = "standard"
. This will print
message("File written: ", basename(filename))
.
(optional, logical) verbose = TRUE
to be chatty
during execution.
Default: verbose = FALSE
.
A file written in the working directory or nothing if it's a GDS connection file.
Originally, the ending .fst
from
package fst was replaced
by .rad
to remove the confusion with population genetics statistic fst ...
The decision to stop using the package fst
was taken because:
The package was always difficult to install when you wanted all cores to function.
Successful recipe to install with one OS was rarely working after changing R version or OS version = painful.
Asking users to play with .R/Makevars
was always time
consuming to troubleshoot after for me.
arrow is easy to install, files are smaller in size and read/write faster!