R language

In a wider programming world, R sometimes has a slightly unfortunate reputation as a badly designed “calculator language”. A computing environment which is (maybe!) good for working with data frames and creating figures, but that’s about it. However, while certainly very useful for data science, R is a full-blown programming language which is actually quite powerful even from a purely computer science perspective.

But still, this is a book about population genomics and data science in R. Why does this matter how much “real coding” we do in it?

Well, although this entire workshop will primarily focus on R primarily as as a statistical and visualization environment, neglecting the aspects of R which make it a “proper” programming language (or even considering data science as “not real programming”) is a huge problem.

First, even when “just” doing data science and statistics, we still use typical programming constructs, we need to be aware of underlying data types behind our data (mostly contents of tables), and we need to think algorithmically. Neglecting these things makes it easy to introduce bugs into our code, make it hard to find those bugs, and make our programs less efficient even when they do work.

This chapter will help you get familiar with some of the less obvious aspects of the R language or programming in general, certainly the parts which are often skipped in undergratuate courses in the life sciences in favor of just teaching plotting and running statistical tests. The good thing is, there isn’t that much you need to learn. And what you do learn will continue paying dividents for the rest of your research career!

Let’s say it again, because people with non-computational backgrounds often feel inadequate when it comes to computational aspects of their work: Even when you’re “just” writing data analysis scripts, even when you’re “just” plotting results, you’re still writing programs. You’re a programmer. How exciting, right? Exercises in this chapter are designed to make you comfortable with programming and algorithmic thinking.