R Fundamentals I

Vectorisation

Learning Objectives

  • To understand vectorised operations in R.
  • To understand the apply function.

Most of R’s functions are vectorised, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.

[1] 2 4 6 8

The multiplication happened to each element of the vector.

We can also add two vectors together:

[1]  7  9 11 13

Each element of x was added to its corresponding element of y:

Vectorised operations work element-wise on matrices:

     [,1] [,2] [,3] [,4]
[1,]   -1   -4   -7  -10
[2,]   -2   -5   -8  -11
[3,]   -3   -6   -9  -12

To combine a matrix with a vector, keep in mind that the element-wise combination happens by columns:

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
[1] 1 2 3 4
     [,1] [,2] [,3] [,4]
[1,]    1   16   21   20
[2,]    4    5   32   33
[3,]    9   12    9   48

To do it by rows, first create a matrix from the vector and then combine the two matrices:

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    1    2    3    4
[3,]    1    2    3    4
     [,1] [,2] [,3] [,4]
[1,]    1    8   21   40
[2,]    2   10   24   44
[3,]    3   12   27   48

Challenge 1

Given the following matrix:

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

Write down what you think will happen when you run:

  1. m ^ -1
  2. m * c(1, 0, -1)
  3. m > c(0, 20)
  4. m * c(1, 0, -1, 2)

Did you get the output you expected? If not, ask a helper!

Applying functions across rows/columns

What if we need an operation (average, sum etc.) across rows or across columns?

Operations Across Axes

To support this, we can use the apply function.

apply allows us to repeat a function on all of the rows (MARGIN = 1) or columns (MARGIN = 2) of a data frame. For example, using the m matrix, instead of

[1] 5.5
[1] 6.5
[1] 7.5

we do

[1] 5.5 6.5 7.5

Also, we can do

[1]  1  4  7 10
[1]  6 15 24 33

Similarly, with our pierce subset dataset created in the previous session, we can get household totals:

hh2016 hh2020 hh2030 hh2040 hh2050 
320054 335245 386446 438732 488373 

and the all-cities hh dataset:

 hh2016  hh2020  hh2030  hh2040  hh2050 
1581863 1657497 1912422 2172765 2419919 

However, for a dataset with missing values:

  id   X  Y
1  a  NA  5
2  b 100  5
3  c 100  5
4  d 100 NA
5  e 100 NA

we get:

 X  Y 
NA NA 

because

[1] NA

but

[1] 3

Therefore

  X   Y 
400  15 

Challenge solutions

Solution to challenge 1

Given the following matrix:

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

Write down what you think will happen when you run:

  1. m ^ -1
          [,1]      [,2]      [,3]       [,4]
[1,] 1.0000000 0.2500000 0.1428571 0.10000000
[2,] 0.5000000 0.2000000 0.1250000 0.09090909
[3,] 0.3333333 0.1666667 0.1111111 0.08333333
  1. m * c(1, 0, -1)
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    0    0    0    0
[3,]   -3   -6   -9  -12
  1. m > c(0, 20)
      [,1]  [,2]  [,3]  [,4]
[1,]  TRUE FALSE  TRUE FALSE
[2,] FALSE  TRUE FALSE  TRUE
[3,]  TRUE FALSE  TRUE FALSE