R Fundamentals I
Vectorisation
Learning Objectives
- To understand vectorised operations in R.
- To understand the
apply
function.
Most of R’s functions are vectorised, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.
[1] 2 4 6 8
The multiplication happened to each element of the vector.
We can also add two vectors together:
[1] 7 9 11 13
Each element of x
was added to its corresponding element of y
:
Vectorised operations work element-wise on matrices:
[,1] [,2] [,3] [,4]
[1,] -1 -4 -7 -10
[2,] -2 -5 -8 -11
[3,] -3 -6 -9 -12
To combine a matrix with a vector, keep in mind that the element-wise combination happens by columns:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
[1] 1 2 3 4
[,1] [,2] [,3] [,4]
[1,] 1 16 21 20
[2,] 4 5 32 33
[3,] 9 12 9 48
To do it by rows, first create a matrix from the vector and then combine the two matrices:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 1 2 3 4
[3,] 1 2 3 4
[,1] [,2] [,3] [,4]
[1,] 1 8 21 40
[2,] 2 10 24 44
[3,] 3 12 27 48
Challenge 1
Given the following matrix:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Write down what you think will happen when you run:
m ^ -1
m * c(1, 0, -1)
m > c(0, 20)
m * c(1, 0, -1, 2)
Did you get the output you expected? If not, ask a helper!
Applying functions across rows/columns
What if we need an operation (average, sum etc.) across rows or across columns?
To support this, we can use the apply
function.
apply
allows us to repeat a function on all of the rows (MARGIN = 1
) or columns (MARGIN = 2
) of a data frame. For example, using the m
matrix, instead of
[1] 5.5
[1] 6.5
[1] 7.5
we do
[1] 5.5 6.5 7.5
Also, we can do
[1] 1 4 7 10
[1] 6 15 24 33
Similarly, with our pierce
subset dataset created in the previous session, we can get household totals:
hh2016 hh2020 hh2030 hh2040 hh2050
320054 335245 386446 438732 488373
and the all-cities hh
dataset:
hh2016 hh2020 hh2030 hh2040 hh2050
1581863 1657497 1912422 2172765 2419919
However, for a dataset with missing values:
id X Y
1 a NA 5
2 b 100 5
3 c 100 5
4 d 100 NA
5 e 100 NA
we get:
X Y
NA NA
because
[1] NA
but
[1] 3
Therefore
X Y
400 15
Challenge solutions
Solution to challenge 1
Given the following matrix:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Write down what you think will happen when you run:
m ^ -1
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2500000 0.1428571 0.10000000
[2,] 0.5000000 0.2000000 0.1250000 0.09090909
[3,] 0.3333333 0.1666667 0.1111111 0.08333333
m * c(1, 0, -1)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 0 0 0 0
[3,] -3 -6 -9 -12
m > c(0, 20)
[,1] [,2] [,3] [,4]
[1,] TRUE FALSE TRUE FALSE
[2,] FALSE TRUE FALSE TRUE
[3,] TRUE FALSE TRUE FALSE