vignettes/rad_genomics_computer_setup.Rmd
rad_genomics_computer_setup.Rmd
Time to allow: ETA ~30-45 min for experience users, 1h-2h for novices
This vignette purposes is to show users how to setup their computers for RADseq genomic analysis inside R. It targets users of my packages or workshops.
The vignette as an Installation problems
section, browse
through it if your experiencing installation issues.
The vignette is also usefull if your:
grur
and it’s dependencies or
simulations with strataG
.rmetasim
XGBoost
LightGBM
randomForestSRC
ranger
missRanger
pcaMethods
fstcore
, fst
,
data.table
).I’m currently merging information also found in this tutorial
Warning: package management software Homebrew and/or MacPorts :
My experience with these packages is that at some point they will be unreliable with genomic software installation. It might do the trick for some software, but eventually you will lose a lot of time trying to figure out what’s your problem.
In this vignette I present brewless options only.
Make sure you have administrator and root user access to your computer how.
Make sure it’s installed…if not, follow instructions. Although the prompt message may be a bit confusing, just click install.
Notes:
xcode-select: error: command line tools are already installed, use "Software Update" to install updates
make
in the terminal, if it’s installed you will have something similar to
this
*** No targets specified and no makefile found. Stop.
Terminology:
./configure
: configure everything before
installationmake
: connect libraries and the source before make
install (doc)make install
: use to build (compile) source code to
create binary files and install the application on our system as
configured using ./configure
, usually in
/usr/local/bin
.Apple can’t ship GNU Compiler Collection (GCC) with OpenMP enabled, similar story with Clang the other compiler used in macOS. Consequently, both need to be updateed manually if you want to run software that ues parallel computing (like stacks).
The command below fro both GCC and Clang will:
bin
, include
,
lib
, libexec
and share
) of the
compiler on your computer /usr/local
directory.Choose the binary version number based on your OS and change the version accordingly link.
Sonoma gcc-14.1-m1-bin.tar.gz
We want clang compiler with OpenMP enabled. The latest version is 19.1.3.
The shell start up script and PATH to programs**
To make things a little easier to talk to your computer, each time you open the Terminal a shell start up scripts tells your computer where to look for programs. The path for your programs can be modified in your shell start up script. When your computer is searching for programs, it looks into these path:
The output should look like this:
/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
. But
sometimes, it will also say: No such file or directory
(no
worries, see below).
Use the pwd
command to know exactly where you are!
The name of the shell startup file differs across platforms. Depending on OS it is called ~/.bash_profile and sometimes ~/.profile. Filename beginning with a dot “.” are reserved for the system and are invisible in the mac Finder.
Find your shell start up script with the following command:
If this returns nothing (blank), you don’t have a shell start up script. Create one with this command
To modify, you can use BBEdit to open or make and modify hidden items (using the option Show hidden items on the open file screen). Look for the free version. With Linux, use Vi!
Copy/paste the line below in your .bash_profile
file:
After modifying your shell start up script always run the command
source ~/.bash_profile
to reload it.
Below are useful but not essential software you will like to have on your mac.
This is TextWrangler replacement and is a free text editor that will
help you save time. Once installed, go in the
Apple Menu bar -> BBEdit -> Install Command Line
This is a Python installer tool that I highly recommend. To install or upgrade pip, securely download get-pip.py.
Run this from you Terminal:
If you get command not found you might not have python installed.
Make sure you have GCC and CLANG with OpenMP enabled. Several flavors available, check for the proper link
To install R v4.4.2 “Pile of Leaves” released on 2024-10-31 download the installer and follow the instructions
To remove R completely from macOS
To download RStudio, check this link and download the installer for your OS.
Below is how I setup most of my computers after a clean macOS install. 1. Start with devtools and tidyverse
if (!require("devtools")) install.packages("devtools") # to install
install.packages("tidyverse")
install.packages("gsl")
If the console print this:
Do you want to install from sources the package which needs compilation? (Yes/no/cancel)
.
Always aswer no unless of course you know what you are
doing.
devtools::install_github("thierrygosselin/grur")
devtools::install_github("thierrygosselin/assigner")
For some packages you might have to compile from source and the use
of different compiler is sometimes very useful. You need to tell R how
to use the compilers. This might change from one package to another.
Nothing is simple, you know this by now… All this is done through R’s
Makevars file located in
~/.R/Makevars
.
To modify or create the file, the fastest way is to use the package
usethis (it’s installed
automatically with devtools
):
usethis::edit_r_makevars()
Makevars content required:
CC=/usr/local/bin/gcc
CXX=/usr/local/bin/g++
FC=/usr/local/bin/gfortran
F77=/usr/local/bin/gfortran
PKG_LIBS = -fopenmp -lgomp
PKG_CFLAGS= -O3 -Wall -pipe -pedantic -std=gnu99 -fopenmp
PKG_CXXFLAGS=-fopenmp -std=c++11
CFLAGS= -O3 -Wall -pipe -pedantic -std=gnu99 -fopenmp
SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FCFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp
# change the nex line according to your computer compiler version (use gcc -v in terminal):
FLIBS=-L/usr/local/lib/gcc/x86_64-apple-darwin19/9.2.0/finclude
CFLAGS=-mtune=native -g -O2 -Wall -pedantic -Wconversion
CXXFLAGS=-mtune=native -g -O2 -Wall -pedantic -Wconversion
Sometimes you’ll get warnings while installing dependencies required for x package.
#Warning: cannot remove prior installation of package ‘stringi’
To solve this problem, delete manually the problematic package in the
installation folder (on mac:
/Library/Frameworks/R.framework/Resources/library
) or in
the Terminal
:
sudo rm -R /Library/Frameworks/R.framework/Resources/library/package_name
# Changing 'package_name' to the problematic package.
# Reinstall the package.
Using the latest version of R, RStudio and packages is recommended. If your heart start pounding just at the thought of having to install a new R version, you should have a look at packrat.
Look at the output in R console when you get an error message. If it’s related to one’s of the packages dependencies, try installing it separately before attempting to reinstall the problematic package.
So far, I’ve only experience 1 problem after upgrading to Big Sur, and it’s linked to adegenet dependency on sf package (solution).
When rtying to compile a software if you get this error:
checking whether the C++ compiler works... no
or
configure: error: C++ compiler cannot create executables
Try installing Xcode from the App Store.
If you get something like:
https not supported or disabled in libcurl
, install or
re-install:
Required if GCC compiler is used (TLS backend is then used). Not required if clang is used (securetransport backend is used).
#In browser
https://www.openssl.org/source/openssl-1.1.1.tar.gz
#In Terminal
cd ~/Downloads
curl -L https://www.openssl.org/source/openssl-1.1.1.tar.gz | tar xf -
cd openssl-1.1.1
./config
make -j12 #change with your number of CPU
make test #long
sudo make install
cd ..
sudo rm -R openssl*
Check for the latest release of curl
#Copy/paste in your browser
https://curl.haxx.se/download/curl-7.85.0.tar.gz
# Terminal
cd ~/Downloads
tar -zxvf curl-7.85.0.tar.gz
cd curl-7.85.0
for more curl* option type curl -h
*
The next step depends on the compiler used
With gcc:
Note: with macOS 11.0.1 this give me an error
configure: error: OpenSSL libs and/or directories were not found where specified!
If this is the case, use clang:
If you have an install problem, the problem might be very
computer-specific. e.g. if the problem is related to
strataG
, copula
and/or gsl
, try
installing libgsl0-dev
in the Terminal
(very easy now with the latest RStudio release!):
vector memory exhausted
For errors that highlight problems with vectors and memory similar
to: vector memory exhausted (limit reached)
. In R, verify
that you have a file called ~/.Renviron
:
file.exists("~/.Renviron")
If you don’t have the file:
Add this to your .Renviron
file located in
~/.Renviron
:
You can also use a text editor that allows you to see hidden files
(files starting with a .
dot).
string.h
, math.h
or any other
.h
vroom
rmetasim
An example during vroom
installation:
clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/progress/include" -I/usr/local/include -Imio/include -DWIN32_LEAN_AND_MEAN -Ispdlog/include -fPIC -Wall -g -O2 -c gen.cc -o gen.o
clang++ -std=gnu++11 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/progress/include" -I/usr/local/include -Imio/include -DWIN32_LEAN_AND_MEAN -Ispdlog/include -fPIC -Wall -g -O2 -c index_collection.cc -o index_collection.o
/usr/local/bin/clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/3.5/Resources/library/progress/include" -I/usr/local/include -fPIC -I/usr/local/include -c localtime.c -o localtime.o
localtime.c:42:10: fatal error: 'string.h' file not found
#include <string.h>
^~~~~~~~~~
With macOS : your Makevars
needs
additional lines to make it work configuration.
.setMaxGlobalSize
Its a problem with a previous version of pbmcapply
and
it’s interaction with future
.
Solution:
RStudio > Session > Restart R
)Note: if your heart start pounding just at the thought of having to update everything on your computer you should definitely have a look at packrat: it’s very easy to use.
.DynamicClusterCall
If you have a PC and you’re getting this error or closely related error:
# Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx, :
# One of the nodes produced an error: Can not open file 'FILE PATH'. The process cannot access the file because it # is being used by another process.
Solution: Use parallel.core = 1
in the
function generating the error.
The error below usually happens when several packages are updated:
# Error in get0(oNam, envir = ns) :
# lazy-load database '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/callr/R/callr.rdb' is corrupt
# In addition: Warning message:
# In get0(oNam, envir = ns) : internal error -3 in R_decompress1
Solution:
dyn.load
I see 2 problems and separate solutions. When the error is related to adegenet, strataG, assigner or packages that depends on sf package, see the solution below.
When the error is similar to:
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/home/rstudio/R/x86_64-pc-linux-gnu-library/3.6/units/libs/units.so':
libudunits2.so.0: cannot open shared object file: No such file or directory
Calls: <Anonymous> ... asNamespace -> loadNamespace -> library.dynam -> dyn.load
Execution halted
Solution:
Sys.getenv("LD_LIBRARY_PATH")
# [1]""
Note that if the output is not empty, like in the example above, write down the output.
/usr/local/lib/
to the output
above:When it’s empty:
# in R:
Sys.setenv(LD_LIBRARY_PATH="/usr/local/lib/")
# For Linux you could use: /usr/local/lib/:/usr/lib64
When it’s not, add at the end, separated by :
Sys.setenv(LD_LIBRARY_PATH="/usr/local/lib64/R/lib:/lib:/usr/local/lib64:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.amzn2.0.1.x86_64/jre/lib/amd64/server:/usr/local/lib/:/usr/lib64")
Long-term solution:
Instead of using Sys.setenv
each time you have a similar
problem, you could add the environment
variables LD_LIBRARY_PATH
to your
.Renviron file
. This is discussed in
another problem above.
dyn.load
Sometimes the problem related to dyn.load problems originates from specific packages. So far, for me, they are always linked to sf package. The problem for macOS is usually when updating os, e.g. from Catalina -> Big Sur. The solution is similar for Linux when you try to install adegenet.
There are solutions using Homebrew, but I’m not a fan for the shortcuts.
brewless version to execute in the terminal:
cd ~/Downloads
curl -L https://artifacts.unidata.ucar.edu/repository/downloads-udunits/current/udunits-2.2.28.tar.gz | tar xf -
cd udunits*
./configure && make -j56 && sudo make install
cd ..
sudo rm -R udunits*
cd ~/Downloads
curl -L https://www.sqlite.org/2022/sqlite-autoconf-3390400.tar.gz | tar xf -
cd sqlite-autoconf*
./configure && make -j56 && sudo make install
cd ..
sudo rm -R sqlite-autoconf*
cd ~/Downloads
curl -L https://download.osgeo.org/libtiff/tiff-4.4.0.tar.gz | tar xf -
cd tiff*
./configure && make -j56 && sudo make install
cd ..
sudo rm -R tiff*
cd ~/Downloads
curl -L https://download.osgeo.org/proj/proj-9.1.0.tar.gz | tar xf -
cd proj*
mkdir build
cd build
cmake ..
cmake --build .
sudo cmake --build . --target install
cd ~/Downloads
sudo rm -R proj*
# with prior releases, when configure was necessary
# ./configure --libdir=/usr/local/lib && make -j56 && sudo make install
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"
cd ~/Downloads
curl -L https://github.com/OSGeo/gdal/releases/download/v3.5.3/gdal-3.5.3.tar.gz | tar xf -
cd gdal*
mkdir build
cd build
cmake ..
cmake --build .
sudo cmake --build . --target install
cd ~/Downloads
sudo rm -R gdal*
# previous install command required
#./configure --prefix=/usr/local --libdir=/usr/local/lib --with-proj=/usr/local && make -j56 && sudo make install #time for coffee...
cd ~/Downloads
git clone https://git.osgeo.org/gitea/geos/geos.git
cd geos*
mkdir build
cd build
cmake ..
cmake --build .
sudo cmake --build . --target install
cd ~/Downloads
sudo rm -R geos*
# previous install command required
#./autogen.sh
#./configure --libdir=/usr/local/lib && make -j56 && sudo make install
Back in R/RStudio
install.packages("rgeos", repos="http://R-Forge.R-project.org", type="source")
install.packages("rgdal", repos="http://R-Forge.R-project.org", type="source")
devtools::install_github("r-spatial/sf", configure.args = "--with-proj-lib=/usr/local")
install.packages("adegenet")
HTTP status was 404 Not Found
This error is sometimes poping after a new R upgrade. Try installing the problematic package differently.
Instead of:
BiocManager::install("SeqArray")
Try:
remotes::install_local(path = "SeqArray_latest.tar.gz")
C stacks usage
So far, I haven’t found the cure to this computer-specific problem.
Potential solutions:
fst
& fstcore
&
data.table
Better to install and compile them from source to enable OpenMP. Install in the terminal zstd and lz4:
cd ~/Downloads
curl -L https://github.com/lz4/lz4/archive/refs/tags/v1.10.0.tar.gz | tar xf -
cd lz4*
make
sudo make install
cd ..
sudo rm -R lz4*
cd ~/Downloads
curl -L https://github.com/facebook/zstd/archive/refs/tags/v1.5.6.tar.gz | tar xf -
cd zstd*
make
sudo make install
cd ..
sudo rm -R zstd*
fstcore
, fst
and data.table
requires these R Makevars specifications. If you have other
lines, comment (#
) before saving and installing from
source:
usethis::edit_r_makevars()
#fstcore fst data.table
CC=/usr/local/bin/gcc -fopenmp
CXX=/usr/local/bin/g++ -fopenmp
CXX11=/usr/local/bin/g++ -fopenmp
CXX14=/usr/local/bin/g++ -fopenmp
CXX17=/usr/local/bin/g++ -fopenmp
CXX1X=/usr/local/bin/g++ -fopenmp
CXX98=/usr/local/bin/g++ -fopenmp
install.packages("fstcore", type = "source")
install.packages("fst", type = "source")
install.packages("data.table", type = "source")
rmetasim
Download the latest github release of Allan Strand’s rmetasim
If you want to use more loci during your simulations (default is
10001), you need to modify rmetasim before compiling. With a text
editor, modify the const.h
file in the src
folder: rmetasim-master/src/const.h
. Navigate to
lane 33 and change the integer to the desired maximum
number of loci. Or do this in the Terminal:
rmetasim requires these Makevars (~/.R/Makevars
file) specifications. If you have other lines, comment (#) before
compiling rmetasim
:
usethis::edit_r_makevars()
macOS
This latest os requires extra lines:
CC=/usr/local/bin/clang
CXX=/usr/local/bin/clang++
CXX1X=/usr/local/bin/clang++
FLIBS=-L/usr/local/lib
LDFLAGS=-L/usr/local/lib
SHLIB_OPENMP_CFLAGS= -fopenmp
SHLIB_OPENMP_FCFLAGS= -fopenmp
SHLIB_OPENMP_FFLAGS= -fopenmp
SHLIB_OPENMP_CXXFLAGS= -fopenmp
CFLAGS+=-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
CCFLAGS+=-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
CXXFLAGS+=-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
CPPFLAGS+=-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
XGBoost
If you’re getting this error:
"https" not supported or disabled in libcurl
, extra steps
are required, check the installation
problems section to install OpenSSL
and
curl
and enabling https
with
--with-ssl
option.
XGBoost
requires these Makevars specifications. If you
have other lines, comment (#) before compiling:
usethis::edit_r_makevars()
CC=/usr/local/bin/gcc
CXX=/usr/local/bin/g++
CXX11=/usr/local/bin/g++
CXX14=/usr/local/bin/g++
CXX17=/usr/local/bin/g++
SHLIB_OPENMP_CFLAGS= -fopenmp
SHLIB_OPENMP_FCFLAGS= -fopenmp
SHLIB_OPENMP_FFLAGS= -fopenmp
SHLIB_OPENMP_CXXFLAGS= -fopenmp
CFLAGS=-g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe
CXXFLAGS=-g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe
LDFLAGS=-L/usr/local/lib -Wl,-rpath,/usr/local/lib
CPPFLAGS=-I/usr/local/include -I/usr/local/include
You should see a time difference between both runs
require(xgboost)
x <- matrix(rnorm(100*10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)
system.time({bst = xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = FALSE)})
system.time({bst = xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = FALSE)})
LightGBM
LightGBM
requires an OpenMP-enabled compiler. Currently,
it doesn’t work well with clang, so make sure you have updated your GCC
compiler (instructions above). Additionally, LightGBM
requires CMake
#In browser or using curl in Terminal
https://cmake.org/files/v3.12/cmake-3.12.2-Darwin-x86_64.dmg
# double-click on the disk image and follow instructions
To add CMake to the PATH:
PATH="/Applications/CMake.app/Contents/bin":"$PATH"
# Or, to install symlinks to '/usr/local/bin', run:
sudo "/Applications/CMake.app/Contents/bin/cmake-gui" --install
# Or, to install symlinks to another directory, run:
sudo "/Applications/CMake.app/Contents/bin/cmake-gui" --install=/path/to/bin
#Then, run the following commands to install LightGBM:
randomForestSRC
randomForestSRC requires the GCC OpenMP-enabled compiler to run in parallel. See instructions above if not already done.
Check that the lines below are not commented in your
~/.R/Makevars
file:
usethis::edit_r_makevars()
CC=/usr/local/bin/gcc
CXX=/usr/local/bin/g++
CFLAGS=-g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe
CXXFLAGS=-g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe
PKG_CFLAGS= -O3 -Wall -pipe -pedantic -std=gnu99 -fopenmp
PKG_CXXFLAGS=-fopenmp -std=c++11
FC=/usr/local/bin/gfortran
F77=/usr/local/bin/gfortran
LDFLAGS=-L/usr/local/lib
PKG_LIBS = "-liconv"
From the Terminal run these steps to download and compile randomForestSRC:
cd ~/Downloads
curl -L https://cran.r-project.org/src/contrib/randomForestSRC_2.9.3.tar.gz | tar xf -
cd randomForestSRC
Make sure you have autoconf installed:
Should output: autoconf: error: no input file
, if not,
install following the steps here.
# in Terminal
cd ~/Downloads/randomForestSRC
autoconf
cd ~/Downloads
R CMD INSTALL --preclean --clean randomForestSRC
You want to make sure that this line is printed during execution of
the previous command:
checking whether OpenMP will work in a package... yes
or
checking for /usr/local/bin/gcc option to support OpenMP... -fopenmp
fastsimcoal2
To install fastsimcoal2
v.2.6.0.3, to use in grur::simulate_rad
:
COLONY
To install COLONY 30/07/2018, V2.0.6.5:
The old openmpi version (openmpi-1.6.5) is required, saddly.
cd ~/Downloads
curl -L https://download.open-mpi.org/release/open-mpi/v1.6/openmpi-1.6.5.tar.gz | tar xf -
cd openmpi-1.6.5
export TMPDIR=/tmp
./configure F77=gfortran #--prefix=/usr/local -openmp # no longer work for some reason
make -j 12
sudo make install
sudo rm -R ~/Downloads/openmpi*
To download COLONY, follow instructions on Jinliang Wang ZSL website. The
file you need to uncompress is named:
colony2.mac_.20180730.zip
.
To download COLONY, follow instructions on Jinliang Wang ZSL website. The
file you need to uncompress is named:
colony2.linux_.20180730.zip
.
Several options are available depending on the compiler you have installed.
macOS comes with Github, a Version
Control System (VCS), pre-installed. However, the install is in
/usr/bin/git
which can make it difficult for beginners to
update. To change this, run these commands:
cd ~/Downloads
curl -L http://ftp.gnu.org/gnu/autoconf/autoconf-latest.tar.gz | tar xf -
cd autoconf-2.69
./configure
make
sudo make install
cd ..
sudo rm -R ~/Downloads/autoconf-*
git --version # show current git version installed
which git # returns where is git on your computer
cd ~/Downloads
git clone https://github.com/git/git # install the latest Git
cd git
make configure
./configure
make -j12
sudo make install
cd ..
sudo rm -R ~/Downloads/git/ # remove git folder
source ~/.bash_profile # reload startup script
git --version # confirmed the version you just installed
which git # returns /usr/local/bin
In System Preferences choose
Keyboard -> Shortcuts
. From the left panel, choose
Services. In the right panel, under Files and
Folders, choose New Terminal at Folder and/or New
Terminal Tab at Folder. Now you can right-click your track pad
or mouse on a folder and choose Services -> New Terminal at
Folder!
With macOS, open the Automator application.
File -> New (cmd-N)
Choose: Services
Left panel, choose: Library -> Utilities
Middle, choose: Copy to Clipboard and drag it to the right panel
Now you want to have: Service receives selected FILES OR FOLDERS in FINDER>
You should have something similar to the image below:
Go in the Finder, select a folder and right click on it you should see ‘copy path to clipboard’ at the bottom or in Services.