Tidyverse introduction

Before we begin, let’s introduce the two stars of the tidyverse ecosystem, which we will be using here:

  1. dplyr is a centerpiece of the entire R data science environment, providing important functions for data manipulation, data summarization, filtering, etc. of tabular data;

  2. readr is an R package which provides very convenient functions for reading (and writing) tabular data. Think of it as a set of better alternatives to base R functions such as read.table(), etc.

Every single script you will be writing in this session will begin with these two lines of code.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(readr)

Let’s also introduce a second star in this session, our example data set. This commands reads a metadata table from a recent huge aDNA paper on the history or the Holocene in West Eurasia, dubbed “MesoNeo” (reference). You can read it like this, which will save it to a variable df. Everything in the section on tidyverse will revolve around this data.

df <- read_tsv("https://tinyurl.com/qwe-asd-zxc")
Rows: 4172 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (22): sampleId, popId, site, country, region, groupLabel, groupAge, flag...
dbl (10): shapeA, latitude, longitude, age14C, ageHigh, ageLow, ageAverage, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Part XYZ:

Part XYZ: Filtering rows

Base R indexing recap

Part XYZ: Selecting columns

Part XYZ: Summarizing data