Read a strata object or file. The strata file contains thes individual's metadata, the stratification: e.g. the population id and/or the sampling sites (see details). Used internally in radiator and might be of interest for users.
read_strata(
strata,
pop.id = FALSE,
pop.levels = NULL,
pop.labels = NULL,
pop.select = NULL,
blacklist.id = NULL,
keep.two = TRUE,
path.folder = NULL,
filename = NULL,
verbose = FALSE
)
(path or object) The strata file or object.
Additional documentation is available in read_strata
.
Use that function to whitelist/blacklist populations/individuals.
Option to set pop.levels/pop.labels
is also available.
(logical) When pop.id = TRUE
, the strata returns
the stratification colname POP_ID
.
Default: pop.id = FALSE
, returns STRATA
.
(optional, string) This refers to the levels in a factor. In this
case, the id of the pop.
Use this argument to have the pop ordered your way instead of the default
alphabetical or numerical order. e.g. pop.levels = c("QUE", "ONT", "ALB")
instead of the default pop.levels = c("ALB", "ONT", "QUE")
.
White spaces in population names are replaced by underscore.
Default: pop.levels = NULL
.
(optional, string) Use this argument to rename/relabel
your pop or combine your pop. e.g. To combine "QUE"
and "ONT"
into a new pop called "NEW"
:
(1) First, define the levels for your pop with pop.levels
argument:
pop.levels = c("QUE", "ONT", "ALB")
.
(2) then, use pop.labels
argument:
pop.labels = c("NEW", "NEW", "ALB")
.
To rename "QUE"
to "TAS"
:
pop.labels = c("TAS", "ONT", "ALB")
.
Default: pop.labels = NULL
. If you find this too complicated,
there is also the strata
argument that can do the same thing,
see below.
White spaces in population names are replaced by underscore.
(optional, string) Selected list of populations for
the analysis. e.g. pop.select = c("QUE", "ONT")
to select QUE
and ONT
population samples (out of 20 pops).
Default: pop.select = NULL
(optional, path or object) A blacklist file in the working directory
or object in the global environment. The data frame
as 1 column (named INDIVIDUALS
) and is filled with the individual IDs
The ids are cleaned with clean_ind_names
for separators,
only -
are tolerated. Duplicates are removed automatically.
Default: blacklist.id = NULL
.
(optional, logical) The output is limited to 2 columns:
INDIVIDUALS, STRATA
.
Default: keep.two = TRUE
.
(optional, path)
If !is.null(blacklist.id) || !is.null(pop.select)
, the modified strata
is written by default in the working directory.
Default: path.folder = getwd()
.
(optional, character) If !is.null(blacklist.id) ||
!is.null(pop.select)
, the modified strata is written by default in the
working directory with date and time appended to strata_radiator_filtered
,
to make the file unique. If you plan on writing more than 1 strata file per minute,
use this argument to supply the unique filename.
Default: filename = NULL
.
(optional, logical) When verbose = TRUE
the function is a little more chatty during execution.
Default: verbose = TRUE
.
A list with several components:
$strata
$pop.levels
$pop.labels
$pop.select
$blacklist.id
The strata file used in radiator is a tab delimited file with
a minimum of 2 columns headers (3 for DArT data users):
INDIVIDUALS
and STRATA
.
If a strata
file is specified with all file formats that don't
require it, the strata argument will have precedence on the population
groupings used internally in those file formats. For file formats without
population/strata groupings (e.g. vcf, haplotype files) if no strata file is
provided, 1 pop/strata grouping will automatically be created.
For vcf and haplotypes file, the strata can also be used as a whitelist of id.
Samples not in the strata file will be discarded from the data set.
The STRATA
column can be any hierarchical grouping.
To create a strata file see individuals2strata
.
If you have already run
stacks on your data,
the strata file is similar to a stacks population map file,
make sure you
have the required column names (INDIVIDUALS
and STRATA
).
The strata column is cleaned of a white spaces that interfere with some
packages or codes: space is changed to an underscore _
.
For DArT data see read_dart
VCF file users, not sure about the sample id inside your file ?
See the example in extract_individuals_vcf
DArT file users, not sure about the sample id inside your file ?
See the example in extract_dart_target_id
if (FALSE) { # \dontrun{
strata.info <- radiator::read_strata(strata)
# the return object is a list with 5 objects:
names(strata.info)
# to get the strata
new.strata <- strata.info$strata
# if naything is changed from the original strata, a new strata file is
# generated automatically:
new.strata <- radiator::read_strata(
strata = strata,
blacklist.id = "blacklisted.ids.tsv"
)
} # }