<- function(p_init, Ne, generatinos) {
simulate_trajectory # ...
# here is where you will put your code
# ...
}
Algorithmic thinking
⚠️⚠️⚠️ This chapter is a work-in-progress! ⚠️⚠️⚠️
Part XYZ: Generalizing code as a function
You now have all the individual pieces to construct a vector of allele frequency trajectories of a given length (one number in that vector corresponding to one generation of a simulation). Write a function simulate_trajectory()
, which will accept the following three parameters and returns the trajectory vector as its output.
p_init
: initial allele frequencygenerations
: number of generations to simulateNe
: effective population size
Then simulate three trajectories with the same parameters (p_init = 0.5
, generations = 1000
, Ne = 1000
– save them in variables traj1
, traj2
, traj3
) and visualize them on a single plot using the functions plot()
and lines()
just as we did earlier.
Yes, this basically involves taking the code you have already written and just wrapping it up in a self-contained function!
Part XYZ: Return result as a data frame
Now, this one will probably not make much sense now but let’s do this as a practice, for the time being. Modify the simulate_trajectory()
function to return a data frame with the following format (this shows just the first five rows). In other words:
- the column
time
gives the time at which the frequency was simulated - the column
frequency
contains the simulated vector, (basically, the index of the for loop iteration), p_init_
andNe
indicate the parameters of the simulation run (and are, therefore, constant for all rows).
p_init | Ne | time | frequency |
---|---|---|---|
0.5 | 1000 | 1 | 0.500 |
0.5 | 1000 | 2 | 0.481 |
0.5 | 1000 | 3 | 0.483 |
0.5 | 1000 | 4 | 0.504 |
0.5 | 1000 | 5 | 0.494 |
Hint: Remember to use the tibble()
function to create your data frame (you will need to first load the tibble R package with library(tibble)
!), not the default data.frame()
function. This will make your life significantly easier.
Note: It might seem super redundant to drag the repeated p_init
and Ne
, columns in the table, compared to the previous solution of just returning the numeric vector. Of course, which option is better depends on the concrete situation. In this case, the reason for tracking all this “metadata” along with the frequency vector will become clearer later.
Part XYZ: Running simulations in a loop
You have now generalized your code so that it is much more flexible by accepting parameters as function inputs. Let’s utilize this self-contained function to run 20 simulations using the function lapply
(getting 20 vectors saved in a list in a variable traj_list
) and then plot the results by looping over each vector of traj_list
in a for
loop.
Hint: You can first initialize an empty plot by calling this bit of code:
plot(NULL, type = "l", xlim = c(1, 1000), ylim = c(0, 1),
xlab = "generations", ylab = "allele frequency")
This basically sets up the context for all trajectories, which you can then plot in the for
loop by calling something like lines(<variable>)
in the loop body.