library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(readr)
Before we begin, let’s introduce the two stars of the tidyverse ecosystem, which we will be using here:
dplyr is a centerpiece of the entire R data science environment, providing important functions for data manipulation, data summarization, filtering, etc. of tabular data;
readr is an R package which provides very convenient functions for reading (and writing) tabular data. Think of it as a set of better alternatives to base R functions such as read.table()
, etc.
Every single script you will be writing in this session will begin with these two lines of code.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(readr)
Let’s also introduce a second star in this session, our example data set. This commands reads a metadata table from a recent huge aDNA paper on the history or the Holocene in West Eurasia, dubbed “MesoNeo” (reference). You can read it like this, which will save it to a variable df
. Everything in the section on tidyverse will revolve around this data.
<- read_tsv("https://tinyurl.com/qwe-asd-zxc") df
Rows: 4172 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (22): sampleId, popId, site, country, region, groupLabel, groupAge, flag...
dbl (10): shapeA, latitude, longitude, age14C, ageHigh, ageLow, ageAverage, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Base R indexing recap