Prerequisites

R and the RStudio IDE are required.

Please install the following libraries by running the following code snippet in the console of your RStudio IDE. Ignore any warnings regarding Rtools and if you are asked to install from sources which needs compilation, click ‘No’.

install.packages(c(“tidyr”))

install.packages(c(“stringr”))

install.packages(c(“dplyr”))

install.packages(c(“ggplot2”))

The code that is used in this class can be found at on github here

I recommend opening this script in R studio while the class is going so you can easily follow along.

What is dplyr?

From a dplyr vignette:

dplyr provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code.

It uses efficient backends, so you spend less time waiting for the computer.

From the tidyverse website:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation “by group”. You can learn more about them in vignette(“dplyr”). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette(“two-table”).