Installation and setup

This entire workshop is organized around the population genetic simulation R package called slendr and an assortment of other useful R packages for data science, all of which will be automatically installed alongside slendr.

The primary goal for the course is to make you comfortable programming complete simulation and modeling scripts from scratch, entirely on your own. Even if you’ve never “really programmed” before, it’s not a problem. By the end of this workshop, you will all be programmers!

Please don’t hesitate to get in touch with me over email if you run into any problems with this setup procedure or with anything else related to the workshop!

Installation steps

Note: It will be great if you do this setup on your personal laptop. Getting everything to run locally is often much easier than trying to install things on shared clusters or other HPC environments.

  1. You should have R and RStudio installed on your computer (any version will do).

  2. Open RStudio and type this into the R console to install the slendr package:

install.packages("slendr")

Note: If you get an error during the installation, please copy-and-paste the entire output from this command and send it to me via email.

  1. After you have slendr successfuly installed, create a dedicated Python environment which it internally uses for simulation and tree-sequence analysis (type this into the R console again).
slendr::setup_env(agree = TRUE)

If everything worked, you should get an optimistic message at the end of the entire procedure saying:

======================================================================
Python environment for slendr has been successfuly created, and the R
interface to msprime, tskit, and pyslim modules has been activated.

In future sessions, activate this environment by calling init_env().
=======================================================================
Click here if running setup_env() fails

If the setup_env() installation procedure above fails, try the following:

  1. Delete the failed installation attempt:
slendr::clear_env(force = TRUE, all = TRUE)
  1. Try installing it again, this time using pip as a Python installation method (the default is conda which unfortunately fails fairly often):
slendr::setup_env(agree = TRUE, pip = TRUE)

Note: If you still get an error during the installation, please copy-and-paste the entire output from this command and send it to me via email.

  1. Paste this little testing slendr simulation script into the R console to make sure that everything works correctly. Don’t read too much into the meaning of the code, understanding (and being able to program) all this and more will be the point of our workshop.

Note: If you managed to successfully run the installation steps 1-3 above, this simulation should finish running without any problems. If it does not work, then there’s something strange going on. Please copy-and-paste the entire output this script produces in your R console (including the error) and send it to me via email.

library(slendr)
init_env()
## The interface to all required Python modules has been activated.

A <- population("popA", time = 8000, N = 1000)
B <- population("popB", time = 4000, N = 1000, parent = A)
C <- population("popC", time = 4000, N = 1000, parent = A)
D <- population("popD", time = 3000, N = 1000, parent = C)

gf <- list(
  gene_flow(from = B, to = D, start = 1000, end = 800, rate = 0.2),
  gene_flow(from = C, to = D, start = 1000, end = 800, rate = 0.1)
)

model <- compile_model(
  populations = list(A, B, C, D),
  gene_flow = gf, generation_time = 30
)

ts <- msprime(model, sequence_length = 1e6, recombination_rate = 1e-8)
ts
## ╔═══════════════════════════╗
## ║TreeSequence               ║
## ╠═══════════════╤═══════════╣
## ║Trees          │        832║
## ╟───────────────┼───────────╢
## ║Sequence Length│  1,000,000║
## ╟───────────────┼───────────╢
## ║Time Units     │generations║
## ╟───────────────┼───────────╢
## ║Sample Nodes   │      8,000║
## ╟───────────────┼───────────╢
## ║Total Size     │    1.4 MiB║
## ╚═══════════════╧═══════════╝
## ╔═══════════╤══════╤═════════╤════════════╗
## ║Table      │Rows  │Size     │Has Metadata║
## ╠═══════════╪══════╪═════════╪════════════╣
## ║Edges      │19,857│620.5 KiB│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Individuals│ 4,000│109.4 KiB│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Migrations │     0│  8 Bytes│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Mutations  │     0│ 16 Bytes│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Nodes      │17,407│476.0 KiB│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Populations│     4│343 Bytes│         Yes║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Provenances│     1│  2.8 KiB│          No║
## ╟───────────┼──────┼─────────┼────────────╢
## ║Sites      │     0│ 16 Bytes│          No║
## ╚═══════════╧══════╧═════════╧════════════╝

The outcome of this script should approximately match what you see above (minus some statistical noise manifesting in your numbers in the summary being slightly different).

  1. Installing a couple of bonus R packages

In order to be able to run my own solutions to some of the exercises (particularly some more advanced bonus exercises which will be entirely optional, and we very likely won’t have time to go through during the short span of the worksop), a couple of additional R packages might be useful.

You can install those like this:

install.packages(c("combinat", "cowplot"))

If you’ve made it this far, you’re good to go!