Detect Simple Sequence Reapeats (SSR) commonly known as microsatellites... radiator is not re-inventing the wheel here, it uses the software GMATA: Genome-wide Microsatellite Analyzing Toward Application.
detect_microsatellites(data, gmata.dir = NULL, ...)
(path or object)
Object in your global environment or a file in the working directory.
The tibble must contain 2 columns named: MARKERS
and SEQUENCE
.
When RADseq data from DArT is used, filter_rad
generates
automatically this file under the name whitelist.markers.tsv
.
(path) For the function to work, the path to the directory
with GMATA software needs to be given.
If not found or NULL
, the function download GMATA
from github in the working directory.
Default: gmata.path = NULL
.
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)
6 files are returned in the folder: detect_microsatellites:
".fa.fms": the fasta file of sequences
".fa.fms.sat1": the summary of sequences analysed (not important)
".fa.ssr": The microsatellites found per markers (see GMATA doc)
".fa.ssr.sat2": Extensive summary (see GMATA doc).
"blacklist.microsatellites.tsv": The list of markers with microsatellites.
"whitelist.microsatellites.tsv": The whitelist of markers with NO microsatellites.
In the global environment, the object is a list with the blacklist and the whitelist.
Thanks to Peter Grewe for the idea of including this type of filter inside radiator.
if (FALSE) { # \dontrun{
# The simplest way to run the function when the raw data was DArT:
mic <- radiator::detect_microsatellites(data = "my_whitelist.tsv")
# With stacks pipeline, the populations module need to be run with --fasta-loci
# You could prepare the file this way (uncomment the function):
#
# prep_stacks_fasta <- function(fasta.file) {
# fasta <- suppressWarnings(
# vroom::vroom(
# file = fasta.file,
# delim = "\t",
# col_names = "DATA",
# col_types = "c",
# comment = "#"
# ) %>%
# dplyr::mutate(MARKERS = stringi::stri_sub(str = DATA, from = 3, to = 7)) %>%
# tidyr::separate(data = ., col = DATA, into = c("SEQUENCE", "LOCUS"), sep = "_")
# )
#
# fasta <- dplyr::bind_cols(
# dplyr::filter(fasta, MARKERS == "Locus") %>%
# dplyr::select(LOCUS),
# dplyr::filter(fasta, MARKERS != "Locus") %>%
# dplyr::select(SEQUENCE)
# ) %>%
# dplyr::mutate(LOCUS = as.numeric(LOCUS))
# return(fasta)
# } #prep_stacks_fasta
} # }