Repeating things? Let’s functionalize it!
When we see code being repeated more than once, functions are a great way to reduce duplication. Even if we call a function only once, they can be a nice way to break up large complicated processes.
To define a function here’s the basic skeleton
my_function_name <- function() {
}
Here’s a CHAS table. Each csv will look similar to this:
# A tibble: 10 × 152
source sumlevel geoid name st cnty T9_est1 T9_est2 T9_est3
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2015thru2… 050 0500… Auta… 01 001 21395 15680 12835
2 2015thru2… 050 0500… Bald… 01 003 80930 60895 54425
3 2015thru2… 050 0500… Barb… 01 005 9345 5690 3460
4 2015thru2… 050 0500… Bibb… 01 007 6890 5130 4330
5 2015thru2… 050 0500… Blou… 01 009 20845 16425 15090
6 2015thru2… 050 0500… Bull… 01 011 3520 2505 750
7 2015thru2… 050 0500… Butl… 01 013 6505 4550 2925
8 2015thru2… 050 0500… Calh… 01 015 44605 31255 25110
9 2015thru2… 050 0500… Cham… 01 017 13450 9070 5745
10 2015thru2… 050 0500… Cher… 01 019 10735 8305 7730
# ℹ 143 more variables: T9_est4 <dbl>, T9_est5 <dbl>, T9_est6 <dbl>,
# T9_est7 <dbl>, T9_est8 <dbl>, T9_est9 <dbl>, T9_est10 <dbl>,
# T9_est11 <dbl>, T9_est12 <dbl>, T9_est13 <dbl>, T9_est14 <dbl>,
# T9_est15 <dbl>, T9_est16 <dbl>, T9_est17 <dbl>, T9_est18 <dbl>,
# T9_est19 <dbl>, T9_est20 <dbl>, T9_est21 <dbl>, T9_est22 <dbl>,
# T9_est23 <dbl>, T9_est24 <dbl>, T9_est25 <dbl>, T9_est26 <dbl>,
# T9_est27 <dbl>, T9_est28 <dbl>, T9_est29 <dbl>, T9_est30 <dbl>, …
Suppose we’d like to do some cleaning to each CHAS table in the same manner. Let’s create one that does the following:
T
and the numbers before the underscore# define the skeleton of our function
# add table as a parameter
clean_table <- function(table) {
# fill it in!
}
Fill in the body with the argument to clean
clean_table <- function(table) {
table %>%
filter(st == 53 & cnty %in% c('033', '035', '053', '061')) %>%
pivot_longer(cols = str_subset(colnames(table), "^T.*"),
names_to = 'header',
values_to = 'value') %>%
mutate(table = str_extract(header, "^T\\d*(?=_)"),
type = str_extract(header, "(?<=_)\\w{3}"),
sort = str_extract(header, "\\d+$"))
}
# Regex used:
# table: "^T\\d*(?=_)" string starting with T and numeric digits followed by _
# type: "(?<=_)\\w{3}" 3 letters preceded by _
# sort: "\\d+$" last numeric digits at the end of the string
t9 <- clean_table(file_01)
Try with other files
If we forgot a step in the cleaning process, we can always edit the function and re-run our script
# Let's make this edit to our function that will convert the sort column from string to numeric
sort = as.numeric(str_extract(header, "\\d+$"))