R bootcamp

(A few remarks and tips before the practical session)

R is the best technology for doing computational science

R has an incredible wealth of toolkits

The most famous is the tidyverse ecosystem for data science:

There are packages for machine learning (Keras, Tensorflow), spatial packages (sf, stars), packages specific to research fields (genomics, ecology, etc.). More than 23000 packages total.

R has awesome easy-to-use(!) tools for reproducibility

  • Quarto “authoring system” for writing automated reports, slides, PDF documents, etc. (our “Topic #4!”)

  • targets pipelining framework (possibly the most powerful and flexible of its kind)

  • tidyverse framework (particularly the dplyr R package introduced as “Topic 2/3”) is designed to facilitate building readable, easy-to-write processing pipelines

  • R itself is a very powerful, flexible programming language

The unfortunate way R is taught…

  • Some slides on “R as a calculator” (only half joking)

  • Then straight into plotting histograms and computing t-tests

    • Effectively treats computation / data science as black box

. . .

  • R was first created “by statisticians for statisticians” (1991)

    • So this way of teaching R makes sense historically

. . .

  • But teaching needs change in modern times:

    • Our data is larger and more complex than in 1990s
    • Reproducibility requires proper programming skills

Challenge of teaching programming

  1. Programming is a skill, not a knowledge to transfer

  2. Teaching R in a lecture format would mean 3 hours of torture

. . .

Today’s “R bootcamp” session is designed to walk you through fundamentals of R in an interactive form.

. . .


A series of problems-solutions to develop understanding of:

  • What happens behind the scenes of data-science operations.
  • Which will give you tools and confidence to build “mental models”

Still, a couple of practical tips


(Having observed how many scientists use R in practice.)

Knowing RStudio well is like having a superpower

Don’t take it as nothing but a text editor like Notepad.

It’s a starship Enterprise of data science at your fingertips. It’s incredible powerful and has a lot of features.

. . .


This cheatsheet has a lot of information, but try to internalize keyboard shortcuts which I highlighted in yellow in the PDF.


At first it will be annoying and slower to use keyboard and not a mouse, but trust me. It will pay of in the long run.

Read-Eval-Print Loop (REPL)

[…] the user enters expressions (rather than an entire [computer program]), the REPL evaluates them and displays the results […] – Wikipedia

An idea from ancient computers (1964!) with these functions:

  1. read — accepts a bit of code from a user (1 + 2)
  2. eval — evaluates the code (applies + on 1 and 2, yielding 3)
  3. print — prints the result 6 on the screen

Steps 1.-3. repeat in an infinite loop, until the program closes.

. . .

R console is a powerful REPL!

R console is like an ultimate experimental lab equipment

R encourages a highly interactive workflow.

When I don’t understand something, some code I don’t get, etc., I always type it in the REPL to build an intuition.

. . .

Doing data analysis is like playing a detective, especially when figuring out bugs and problems.

. . .

Form a hypothesis, run a tiny bit of R code to test the hypothesis. Move forward based on the result you got.

. . .

I see a lot of experienced PhD students writing and running long code top-to-bottom, instead of thinking methodically.

Built-in R help always has an answer!

All languages (and their packages) have documentation, sure.

But it’s mostly scattered on the internet, often hard to find.

. . .

R packages have a standardized documentation inside R!

  • Every func has a manual page available at command ?func

. . .

Every single such help page describes:

  1. Basic usage of the function
  2. Which optional parameters can be given
  3. Description of what the function does
  4. Runnable example code (!!!)

These manuals are amazingly helpful

(Help for a function ts_tajima() from my R package.)

Consider switching the pane layout

In the RStudio menu Global Options -> Pane Layout set:

Maximum vertical space for code and easy switching between script and R console (particularly with keyboard shortcuts).

Let’s get started!

  1. Go to www.bodkan.net/simgen
  2. Click on “R bootcamp” in the left panel—these are the materials for this session (exercises, solutions, explanations)
  3. “Cheatsheets and handouts” section in the left panel contains a single-page version of these slides and RStudio and base R cheatsheets for your reference
  4. Open your RStudio and start working!