FH measure of IBDg — ibdg

FH is a proxy mesure of IBDg based on the excess in the observed number of homozygous genotypes within an individual, relative to the mean number of homozygous genotypes expected under random mating (Keller et al., 2011; Kardos et al., 2015; Hedrick & Garcia-Dorado, 2016).

IBDg is the realized proportion of the individual genome that is identical by descent by reference to the current population under hypothetical random mating (Keller et al., 2011; Kardos et al., 2015; Hedrick & Garcia-Dorado, 2016).

This function is using a modified version of the FH measure (constructed using PLINK -het option) described in (Keller et al., 2011; Kardos et al., 2015).

The novelties are:

population-wise: the individual's observed homozygosity is contrasted against the expected homozygosity. Two estimates of the expected homozygosity are provided based on the population and/or the overall expected homozygosity averaged across markers.
tailored for RADseq: instead of using the overall number of markers, the population and the overall expected homozygosity are averaged with the same markers the individual's are genotyped for. This reduces the bias potentially introduced by comparing the individual's observed homozygosity (computed from non-missing genotypes) with an estimate computed with more markers found at the population or at the overall level.

The FH measure is also computed in stackr summary_haplotypes function and grur missing_visualization functions. See theory below for the equations.

ibdg_fh(data, path.folder = NULL, verbose = TRUE, ...)

Arguments

data

(4 options) A file or object generated by radiator:

tidy data
Genomic Data Structure (GDS)

How to get GDS and tidy data ? Look into tidy_genomic_data, read_vcf or tidy_vcf.

path.folder

(path, optional) By default will print results in the working directory. Default: path.folder = NULL.

verbose

(optional, logical) When verbose = TRUE the function is a little more chatty during execution. Default: verbose = TRUE.

...

(optional) To pass further arguments for fine-tuning the function.

Value

A list is created with 4 objects:

$fh: the individual's FH values
$fh.stats: the population and overall FH values. These values are calculated by averaging individual FH across samples and populations.
$fh.box.plot: the boxplot.
$fh.distribution.plot: the histogram.

FH measure is on average negative when the parents are less related than expected by random mating. The distribution fh.distribution.plot should be centered around 0 in samples of non-inbred individuals.

Theory

Modified FH: $$F_{h_i} = \frac{\overline{Het}_{obs_{ij}} - \overline{Het}_{exp_j}}{\sum_{i}snp_{ij} - \overline{Het}_{exp_j}}$$

Individual Observed Heterozygosity averaged across markers: $$\overline{Het}_{obs_i} = \frac{\sum_iHet_{obs_i}}{\sum_i{snp_i}}$$

Population expected Heterozygosity (under Hardy-Weinberg) and tailored by averaging for each individual using his genotyped markers: $$\overline{Het}_{exp_j} = \frac{\sum_jHet_{exp_j}}{\sum_j{snp_j}}$$

Advance mode

dots-dots-dots ... allows to pass several arguments for fine-tuning the function. These arguments are described in tidy_genomic_data.

filter.monomorphic = TRUE (default)
filter.common.markers = FALSE. The argument for common markers between populations is set by default to maximize genome coverage of individuals and populations.

References

Keller MC, Visscher PM, Goddard ME (2011) Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics, 189, 237–249.

Kardos M, Luikart G, Allendorf FW (2015) Measuring individual inbreeding in the age of genomics: marker-based measures are better than pedigrees. Heredity, 115, 63–72.

Hedrick PW, Garcia-Dorado A. (2016) Understanding Inbreeding Depression, Purging, and Genetic Rescue. Trends in Ecology and Evolution. 2016; 31: 940-952.

Author

Thierry Gosselin thierrygosselin@icloud.com

Examples

if (FALSE) { # \dontrun{
# Using a  VCF file, the simplest for of the function:
fh <- ibdg_fh(data = "sturgeon.gds")

# To see what's inside the list
names(fh)

# To view the boxplot:
fh$fh.boxplot

# To view the distribution of FH values:
fh$fh.distribution.plot
} # }