Read a strata object or file. The strata file contains thes individual's metadata, the stratification: e.g. the population id and/or the sampling sites (see details). Used internally in radiator and might be of interest for users.

read_strata(
  strata,
  pop.id = FALSE,
  pop.levels = NULL,
  pop.labels = NULL,
  pop.select = NULL,
  blacklist.id = NULL,
  keep.two = TRUE,
  path.folder = NULL,
  filename = NULL,
  verbose = FALSE
)

Arguments

strata

(path or object) The strata file or object. Additional documentation is available in read_strata. Use that function to whitelist/blacklist populations/individuals. Option to set pop.levels/pop.labels is also available.

pop.id

(logical) When pop.id = TRUE, the strata returns the stratification colname POP_ID. Default: pop.id = FALSE, returns STRATA.

pop.levels

(optional, string) This refers to the levels in a factor. In this case, the id of the pop. Use this argument to have the pop ordered your way instead of the default alphabetical or numerical order. e.g. pop.levels = c("QUE", "ONT", "ALB") instead of the default pop.levels = c("ALB", "ONT", "QUE"). White spaces in population names are replaced by underscore. Default: pop.levels = NULL.

pop.labels

(optional, string) Use this argument to rename/relabel your pop or combine your pop. e.g. To combine "QUE" and "ONT" into a new pop called "NEW": (1) First, define the levels for your pop with pop.levels argument: pop.levels = c("QUE", "ONT", "ALB"). (2) then, use pop.labels argument: pop.labels = c("NEW", "NEW", "ALB"). To rename "QUE" to "TAS": pop.labels = c("TAS", "ONT", "ALB"). Default: pop.labels = NULL. If you find this too complicated, there is also the strata argument that can do the same thing, see below. White spaces in population names are replaced by underscore.

pop.select

(optional, string) Selected list of populations for the analysis. e.g. pop.select = c("QUE", "ONT") to select QUE and ONT population samples (out of 20 pops). Default: pop.select = NULL

blacklist.id

(optional, path or object) A blacklist file in the working directory or object in the global environment. The data frame as 1 column (named INDIVIDUALS) and is filled with the individual IDs The ids are cleaned with clean_ind_names for separators, only - are tolerated. Duplicates are removed automatically. Default: blacklist.id = NULL.

keep.two

(optional, logical) The output is limited to 2 columns: INDIVIDUALS, STRATA. Default: keep.two = TRUE.

path.folder

(optional, path) If !is.null(blacklist.id) || !is.null(pop.select), the modified strata is written by default in the working directory. Default: path.folder = getwd().

filename

(optional, character) If !is.null(blacklist.id) || !is.null(pop.select), the modified strata is written by default in the working directory with date and time appended to strata_radiator_filtered, to make the file unique. If you plan on writing more than 1 strata file per minute, use this argument to supply the unique filename. Default: filename = NULL.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

Value

A list with several components:

  1. $strata

  2. $pop.levels

  3. $pop.labels

  4. $pop.select

  5. $blacklist.id

Details

The strata file used in radiator is a tab delimited file with a minimum of 2 columns headers (3 for DArT data users): INDIVIDUALS and STRATA. If a strata file is specified with all file formats that don't require it, the strata argument will have precedence on the population groupings used internally in those file formats. For file formats without population/strata groupings (e.g. vcf, haplotype files) if no strata file is provided, 1 pop/strata grouping will automatically be created. For vcf and haplotypes file, the strata can also be used as a whitelist of id. Samples not in the strata file will be discarded from the data set. The STRATA column can be any hierarchical grouping. To create a strata file see individuals2strata. If you have already run stacks on your data, the strata file is similar to a stacks population map file, make sure you have the required column names (INDIVIDUALS and STRATA). The strata column is cleaned of a white spaces that interfere with some packages or codes: space is changed to an underscore _.

For DArT data see read_dart

example.strata.tsv.

example.dart.strata.tsv.

VCF

VCF file users, not sure about the sample id inside your file ? See the example in extract_individuals_vcf

DArT

DArT file users, not sure about the sample id inside your file ? See the example in extract_dart_target_id

Examples

if (FALSE) {
strata.info <- radiator::read_strata(strata)

# the return object is a list with 5 objects:
names(strata.info)

# to get the strata
new.strata <- strata.info$strata

# if naything is changed from the original strata, a new strata file is
# generated automatically:

new.strata <- radiator::read_strata(
    strata = strata,
    blacklist.id = "blacklisted.ids.tsv"
    )

}