---
title: "Neanderthal ancestry throuhg time"
author: "WRITE YOUR NAME HERE :)"
date: now
format: html
self-contained: true
---
# Basic processing
## Load required packages
```{r}
# add required code here
```
## Read in data from the internet
```{r}
# add required code here
```
## Inspect the format of the table
First couple of rows of the data:
```{r}
# add required code here
```
A `glimpse` at the column values:
```{r}
# add required code here
```
# Visualizing data
## Visualize the 10% of the data points
```{r}
# add required code here (it used the `sample_frac()` function)
```
## Fit a smoothed out line through all parameter combinations
```{r}
# add required code here which plotted the beautiful finalized figure
# of smoothed lines of Neanderthal ancestry trajectories
```
# Fitting a formal linear regression model
## Extract data for "direct" and "indirect" $f_4$-ratio
Filter data on the parameter setting of `rate_eur2afr == 0.2`, and
extract statistics for "direct" and "indirect" $f_4$-ratio statistics,
saving them in data frames `df_direct` and `df_indirect`:
```
df_direct <- filter( ... add requried code here ...)
df_indirect <- filter( ... add required code here ...)
```
## Run linear regression on "direct" vs "indirect" $f_4$-ratio
Run a linear model using the `lm()` function (help can be found under
`?lm`) to find a p-value of the statistical significance between trajectories
of "direct" and "indirect" $f_4$-ratio estimates:
```{r}
# computing a linear regresion on df_direct and df_indirect data
lm_direct <- lm(proportion ~ time, data = df_direct)
lm_indirect <- lm(proportion ~ time, data = df_indirect)
```
Run `summary()` on both objects with result of `lm()` to get a p-value.
Which statistic gives a statistically significant decline of Neanderthal
ancestry over time?
```{r}
# run summary on the lm fit of the direct f4-ratio
```
```{r}
# run summary on the lm fit of the indirect f4-ratio
```
# Conclusions
Here is where you could conclude what you found out in the course of the analysis.
Take home messages, pointers to follow up analyses in other later scripts and/or
Quarto notebooks, etc.
Quarto reports and slides
Introduction
From the previous section, you have now set up a proper computational project structure. You’ve added your pipeline R scripts which download, manipulate, filter, and otherwise process “raw data” into “processed data”, the latter being the starting point of data analysis, visualization, and statistical inference. You’ve also learned to write standalone scripts to do all of that.
Honestly, what you’ve learned in the previous chapter on general R project setup is everything you might need to get through a Master’s project or even a PhD. To reiterate:
You’re data processing code is now fully automated and completely reproducible.
You can also generate results, figures, and summaries in an equally automated way.
In principle, you don’t need anything else beyond nicely organized scripts which produce results (tables, figures, etc.) in an equally organized way.
In this session, you will learn about another possibility of doing reproducible data science. Not a replacement for the previous script-based workflows, but a complementary approach to doing the same thing. I personally use both approaches.
Let’s introduce the Quarto system for reproducible scientific research. First, please take a moment to watch this wonderful presentation (you can stop watching by about 15 minutes, when the discussion turns to building websites).
Quick, fast-forward exercise
If you didn’t go through the previous exercises, don’t worry, there’s a huge amount of what you can learn without relying on previously-defined scripts.
For instance, download the source file for my slides on ggplot2. They are written in Quarto, so try opening them in your RStudio, click on the "Source" on top of your editor window, and hit the "=> Render" button.
Slides should appear! The slides (and specifically the source file for these slides) are a great example of a “reproducible presentat”— presentation which is actually generated by R and includes your various plots, comments, tables, etc.
Now, use the reference materials about writing Quarto presentations and, together with my example slides I linked above, try to write your own presentation, perhaps using the examples from our ggplot2 session (pretending that you’re creating slides for a group update!). Alternatively, use this as a basis to create your own presentation about your own project. Even just a couple of slides, with some basic text formatting, and 2-3 figures from your own results is an amazing start.
In the “header” section of the Quarto source file (header is the text between --- and --- on top), change revealjs to html. Click on the => Render button again and see what happens. Now you have a (again completely reproducible) “computational report” instead of slides!
Exercise 1: Creating a reproducible report
The previous exercises were focused on our metadata and IBD data, and turning our disorganized pile of code into proper project structure.
In this session, we’ll do a similar thing, but focus instead on the analysis of Neanderthal proportions in a time-series of aDNA individuals from Europe discussed here.
First create a new blank Quarto document by doing the following:
- Click on
File->New File->Quarto Document.... - Click on
Create Empty Document. - Save the file under
notebooks/neand_ancestry.qmd.
(I like to call what we’ll be creating as “a notebook”, because it’s very similar to a normal lab notebook).
In your new Quarto file, first make sure you have the “Source” view turned on (top left of your editor window). For now at least, you will want to switch to “Visual” when you start writing!
Paste in the following template (just to save you a lot of annoying typing):
Click on the "=> Render" button on top of your editor window and see the magic happen!
You can also check the "Render on Save" box on top of your editor window. See what happens when you save the document using CTRL / CMD + S. This can slow things down for long-running analyses, but is very convenient otherwise.
In a Quarto document (of any kind, reports, presentations, anything), this is a very document component. It’s called a “code block”:
```{r}
# here is your code
```Whenever a Quarto document is rendered, R executes code in these code blocks! It then includes a figure, prints the result, etc., which then becomes the part of the resulting document. I hope you can now appreciate how useful this is for:
Making your reserach more reproducible – the code and the results are part of a single document, which is ran top to bottom, automatically!
Making your research easier to do – this is effectively a lab notebook of your research activity for that particular project. You can write notes, comments, reminders, conclusions, etc.
This allows you to avoid hunting down for “which bit of code and where created this particular figure?”, which is very very stressful at times, especially close to deadlines.
Exercise 2: Completing your report
Now that you’ve rendered the document, you can see that I left you guidelines and blanks to fill in in the Quarto template. Using the set of exercises on the topic of Neanderthal proportion in a time-series of aDNA individuals from Europe discussed here, fill in the blanks accordingly.
Try to get in the mindset of using this document as your lab notebook! If you didn’t manage to get through the exercises on analyzing and plotting Neanderthal ancestry proportions at the link above, use this opportunity to work on those exercises, this time using your Quarto document as a means to solve them.
Hint: Again, if you ever need help, here it is:
Exercise 4: Adjusting code chunk options
You can see that the final report contains both the code and the results of this code. Sometimes you don’t want that, particularly when you want to create not a document, but presentation slides like you will do in the next exercise.
My favourites (and the only ones I personally remember) are these ones:
- Show code, but hide it first (reader has to click)! This is my favourite, because sometimes your supervisor doesn’t want to read code, they just want to see a figure. :)
```{r}
#| code-fold: true
# here is your code which will not be shown in the report
```- Don’t show code, but show results:
```{r}
#| echo: false
# here is your code which will not be shown in the report
```- Show code, but don’t evaluate it (it produces no results):
```{r}
#| eval: false
# here is your code which will not be shown in the report
```Experiment with the above mentioned options in your report (or slides in the next exercise). Here’s a very useful summary of many more options.
Exercise 4: Creating slides
Here’s my favourite aspect of Quarto. You can not only create fully reproducible “lab notebooks”, but you can also create automatically generated slides. This is extremely useful as a means to have realiable means to have up-to-date presentations for group meetings, etc.
Create a new Quarto Document (File -> New File -> Quarto Presentation... -> Create Empty Document.). Then copy the entire contents of yourreports/neand_ancestry.qmd into this new document, and save it as reports/neand_ancestry_slides.qmd.
Then change this one single line in the header at the top of your file, changing format: html to format: revealjs.
Then click the "Render" button again! Observe the magic happen!
It’s pretty obvious to you now that slides have different requirements than documents. For one, including lots of code (or maybe any code) isn’t that useful. Additionally, showing library(...) calls in a presentation doesn’t make any sense either plus, slightly different formatting might be needed.
Take a look at this overview of the Quarto slides functionality. Then edit your slides (remove unnecessary headings/slide titles, etc.) to make them more suitable for presentation in a meeting.
For a more practical set of tips (how to include animated slides, how to do formatting, how to include images, etc.), you can take a look at the source .qmd file for the [introduction presentation for this workshop] (https://github.com/bodkan/simgen/blob/main/slides_whoami.qmd). You can click through them yourself interactively here.
Note: Remove slides which are not useful, show only code which is important (like the lm model?), focus on figures and statistical summary() on the linear regression results.
Exercise 5: Recording R session information
The following command should be included at the end of your “Quarto reports”. When you run it, how would you read and interpret the information it provides? What do you think is the most important information which might be missing in case you need to pick up someone else’s project or script?
sessionInfo()R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Copenhagen
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.5.2 fastmap_1.2.0 cli_3.6.5
[5] tools_4.5.2 htmltools_0.5.8.1 rstudioapi_0.17.1 yaml_2.3.10
[9] rmarkdown_2.30 knitr_1.50 jsonlite_2.0.0 xfun_0.54
[13] digest_0.6.38 rlang_1.1.6 evaluate_1.0.5
Create a new chunk at the end of your document and add this command to this chunk to be included every time it is rendered.
Exercise 6: The entire point of doing all this workshop
In this final exercise, I would like you to take whatever data you have, and try to use some of what you’ve learned so far — about R programming, about tidyverse, about ggplot2 — and create a Quarto report in which you will put some of what you’ve learned into practice.
Alternatively, if you have a messy set of scripts (we all have that, even my stuff is messy, don’t worry) ready and some results already generated, try to transform them into what would be a nice, automated, and reproducible Quarto report.
You could also do work on transforming code which you now realize could be organized in a more structured way, perhaps like we’ve learned in the previous session on building R pipelines, into a tidy step-by-step cascade of R scripts.
The sky is a limit!
Fun fact
This entire workbook and course is written in Quarto! :)