R Fundamentals II
Functions
Learning Objectives
- Define a function that takes arguments.
- Return a value from a function.
- Test a function.
- Set default values for function arguments.
In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.
Defining a function
Let’s open a new R script file and call it functions.R. We will write a function to compute a growth rate of two numbers x and y:
We define growth.rate by assigning it to the output of function. The list of argument names are contained within parentheses. Next, the body of the function – the statements that are executed when it runs – is contained within curly braces ({}). The statements in the body are indented by four spaces. This makes the code easier to read but does not affect how the code operates.
When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function. Inside the function, we use a return statement to send a result back to whoever asked for it.
Let’s try running our function. Calling our own function is no different from calling any other function:
[1] 100
[1] 20
Our implementation of the growth.rate function supports vectorized inputs:
[1] 100   0 -20
Combining functions
The real power of functions comes from mixing, matching and combining them into ever large chunks to get the effect we want.
Here we define a function that computes a growth rate for given columns of a dataset and adds it to the dataset as an additional column:
add.growth.rate <- function(dat, col1, col2, new.name = "GrowthRate") {
  newcol <- growth.rate(dat[, col1], dat[, col2])
  dat[[new.name]] <- round(newcol, 1)
  return(dat)
}
add.growth.rate(head(hh), "hh2016", "hh2050")  city_id hh2016 hh2020 hh2030 hh2040 hh2050     city_name county_id
1       1   2705   2735   2836   2939   3037 Normandy Park        33
2       2  24886  26527  32059  37708  43071        Auburn        33
3       3  45021  45724  48094  50515  52813    King-Rural        33
4       4  10135  11122  14449  17846  21072        SeaTac        33
5       5  22527  23240  25643  28097  30427     Shoreline        33
6       6  16769  17481  19881  22332  24658    Renton PAA        33
  county_name GrowthRate
1        King       12.3
2        King       73.1
3        King       17.3
4        King      107.9
5        King       35.1
6        King       47.0
  city_id hh2016 hh2020 hh2030 hh2040 hh2050     city_name county_id
1       1   2705   2735   2836   2939   3037 Normandy Park        33
2       2  24886  26527  32059  37708  43071        Auburn        33
3       3  45021  45724  48094  50515  52813    King-Rural        33
4       4  10135  11122  14449  17846  21072        SeaTac        33
5       5  22527  23240  25643  28097  30427     Shoreline        33
6       6  16769  17481  19881  22332  24658    Renton PAA        33
  county_name GR16-50
1        King    12.3
2        King    73.1
3        King    17.3
4        King   107.9
5        King    35.1
6        King    47.0
We’ve set a default argument to "GrowthRate" using the = operator in the function definition. This means that this argument will take on that value unless the user specifies otherwise.
Let’s write one more function that returns a subset of cities above a given threshold of growth.
gr.outliers <- function(dat, threshold = 50, grcol = "GrowthRate") {
    res <- dat[dat[[grcol]] > threshold, ]
    return(res)
}
hhgr <- add.growth.rate(hh, "hh2030", "hh2050")
gr.outliers(hhgr, 100)    city_id hh2016 hh2020 hh2030 hh2040 hh2050         city_name county_id
71       72    191    350    887   1435   1955      Poulsbo PUTA        35
129     130     52     90    219    350    474      Stanwood UGA        61
132     133     61     89    182    277    368 Granite Falls UGA        61
NA       NA     NA     NA     NA     NA     NA              <NA>        NA
134     135     23     53    154    256    353        Sultan UGA        61
135     136      5    160    682   1216   1722      Woodway MUGA        61
136     137     36     52    106    161    213    Darrington UGA        61
155     164      1      2      4      7     10 South Prairie PAA        53
    county_name GrowthRate
71       Kitsap      120.4
129   Snohomish      116.4
132   Snohomish      102.2
NA         <NA>         NA
134   Snohomish      129.2
135   Snohomish      152.5
136   Snohomish      100.9
155      Pierce      150.0
(Don’t worry about the NAs in the output for now - we’ll fix it in the next section.)
If you’ve been writing these functions down into a separate R script (a good idea!), you can load in the functions into our R session by using the source function: