Welcome to our second R Markdown lesson. This lesson is going to create an html output file that summarizes population data from the Office of Financial Management (OFM). This lesson builds off of your previous lessons in dplyr
and ggplot2
so hope you remember all that great stuff! I am also going to show you a few tricks to make this look more PSRC appropriate.
To start this lesson, first let’s clear out the environment and restart the R session. It is a good idea to clear out the environment and restart R when you switch projects. If you don’t, all your previous data will remain in memory as well as your libraries. It can lead to issues so it is best to start fresh - which also happens if you just close and reopen RStudio. To clear the environment and restart R do the following:
Clear Environment
Restart R
Now let’s load the file ofm_trend.Rmd
that you already downloaded into RStudio as our starting point. This RMD
document should look somewhat familiar to the one we just did with Bob. It has a yaml header
, a code chunk
and some inline text
. Lets take a look at the components quickly.
The header tells R that we want an HTML document that uses some custom formatting provided in the file styles.css
. We also want a table of contents that floats and only shows items down to Heading Level 2.
---
title: “OFM Population Estimates”
output:
html_document:
theme: cerulean
css: styles.css
toc: TRUE
toc_float: TRUE
toc_depth: 2
---
This chunk of code loads all the libraries that we will be using in the lesson and provides some basic inputs that are used throughout the script. All the libraries we are using today were used in the previous two R sessions, the only new one for today is knitr
.
There are a couple new concepts that we will touch on today but not spend a lot of time looking at - how to download an Excel file directly from a website and also how to use other files that might contain functions, lists, inputs etc. in your code without having to re-type them.
# This code "chunk" contains all the libraries/packages that are used in the overall script as well basic inputs that we might want to change
# Packages for Data Cleaning/Processing
library(tidyr)
library(dplyr)
library(stringr)
library(openxlsx)
# Packages for Table and Chart creation
library(ggplot2)
library(scales)
library(plotly)
# Packages used to create markdown document (needed to knit children)
library(knitr)
# Some basic variables that will be used by the script
psrc_counties <- c("King","Kitsap","Pierce","Snohomish")
ofm.url <- 'https://www.ofm.wa.gov/sites/default/files/public/dataresearch/pop/april1/ofm_april1_population_final.xlsx'
# Read in Functions and color palette
source(file.path(getwd() , 'functions.R'))
source(file.path(getwd() , 'psrc_palette.R'))
When we source
a file, we are effectively telling R to run that file and make all of its contents available in the current R session. In this example, we source
two files:
The functions.R
file has four custom functions that were written to create:
I write functions like this so that I can make my charts and figures look consistent without having to keep copying and pasting things. Remember the golden rule on functions - if you copy the same line more than twice, write a function instead! Let’s open funcitons.R
in RStudio so you can see what it looks like using:
I have also placed the code here as well:
# Useful Functions
create_line_chart <- function(w.data, w.x, w.y, w.color, w.ltype, w.lwidth, w.palette) {
y.max <- 1.25 * max(w.data[w.y])
x.breaks <- unique(w.data %>% pull(w.x))
g <- ggplotly(ggplot(data=w.data,
aes(x = get(eval(w.x)),
y = get(eval(w.y)),
color = get(eval(w.color)),
group=1,
text = paste0("<b>Year: </b>", get(eval(w.x)), "<br>","<b>Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
geom_line(linetype = w.ltype,
size = w.lwidth) +
scale_y_continuous(labels = label_comma(), limits = c(0,y.max)) +
scale_x_continuous(breaks= x.breaks) +
scale_color_manual(values = w.palette) +
labs(x = w.x, y = NULL) +
theme(plot.title = element_text(size = 10, face = 'bold'),
axis.text.x = element_text(angle = 0,
hjust = 1,
vjust = 1,
family = 'Comic Sans MS'),
axis.ticks.x = element_blank(),
axis.line.x = element_blank(),
axis.line.y = element_line(colour="#BBBDC0",size = 0.25),
panel.background = element_blank(),
panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
text = element_text(family = "Segoe UI"),
legend.position = "bottom",
legend.title = element_blank()),
tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25), hovermode = "x")
return(g)
}
create_bar_chart <- function(w.data, w.x, w.y, w.color, w.bartype, w.transparent, w.palette, x.type) {
if (x.type == "discrete") {
g <- ggplotly(ggplot(data = w.data,
aes(x = get(eval(w.x)),
y = get(eval(w.y)),
fill = get(eval(w.color)),
group=1,
text = paste0("<b>", get(eval(w.color))," Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
geom_col(color = "black",
alpha = w.transparent,
position = w.bartype) +
scale_y_continuous(labels = label_comma()) +
labs(x = w.x, y = NULL) +
theme(plot.title = element_text(size = 10, face = 'bold'),
axis.text.x = element_text(angle = 0,
hjust = 1,
vjust = 1,
family = 'Comic Sans MS'),
axis.ticks.x = element_blank(),
axis.line = element_blank(),
panel.background = element_blank(),
panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
text = element_text(family = "Segoe UI"),
legend.position = "bottom",
legend.title = element_blank()),
tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25))
} else {
x.breaks <- unique(w.data %>% pull(w.x))
g <- ggplotly(ggplot(data = w.data,
aes(x = get(eval(w.x)),
y = get(eval(w.y)),
fill = get(eval(w.color)),
group=1,
text = paste0("<b>", get(eval(w.color))," Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
geom_col(color = "black",
alpha = w.transparent,
position = w.bartype) +
scale_y_continuous(labels = label_comma()) +
scale_fill_manual(values = w.palette) +
scale_x_continuous(breaks= x.breaks) +
labs(x = w.x, y = NULL) +
theme(plot.title = element_text(size = 10, face = 'bold'),
axis.text.x = element_text(angle = 0,
hjust = 1,
vjust = 1,
family = 'Comic Sans MS'),
axis.ticks.x = element_blank(),
axis.line = element_blank(),
panel.background = element_blank(),
panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
text = element_text(family = "Segoe UI"),
legend.position = "bottom",
legend.title = element_blank()),
tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25))
}
return(g)
}
create_facet_bar_chart <- function(w.data, w.x, w.y, w.color, w.vars, w.scales) {
g <- ggplotly(ggplot(data = w.data,
aes(x = reorder(get(eval(w.x)), -get(eval(w.y))),
y = get(eval(w.y)),
fill = get(eval(w.color)),
text = paste0("<b>", get(eval(w.color)), ": </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
geom_col(
color = "black",
alpha = 1.0,
position = "dodge") +
labs(x = NULL, y = NULL) +
scale_y_continuous(labels = label_comma() ) +
theme(plot.title = element_text(size = 10, face = 'bold'),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.line = element_blank(),
panel.background = element_blank(),
panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
text = element_text(family = "Segoe UI"),
legend.position = "none",
legend.title = element_blank())+
facet_wrap(vars(get(eval(w.vars))), scales = w.scales) +
theme(legend.position = "none"),
tooltip = c("text"))
return(g)
}
return_population_estimate <- function(data, place, year) {
pop <- data %>% filter(Jurisdiction == place & Year == year) %>% select(Estimate) %>% as.numeric
return(pop)
}
palette.R
has information on what colors to use for different specific names and categories. It is a named vector where if you pass it the name (the value on the left side of the equal sign) it returns the HEX color (the value on the right side of the equal sign). In this code, Kitsap County
will always be colored with hex code #F4835E
.
psrc_colors <- c(
"King County" = "#AD5CAB",
"Incorporated King County" = "#C388C2",
"Unincorporated King County" = "#E3C9E3",
"Kitsap County" = "#F4835E",
"Incorporated Kitsap County" = "#F7A489",
"Unincorporated Kitsap County" = "#FBD6C9",
"Pierce County" = "#A9D46E",
"Incorporated Pierce County" = "#C0E095",
"Unincorporated Pierce County" = "#E2F1CF",
"Snohomish County" = "#40BDB8",
"Incorporated Snohomish County" = "#73CFCB",
"Unincorporated Snohomish County" = "#BFE9E7",
"Region Total" = "#91268F",
"Incorporated Region Total" = "#AD5CAB",
"Unincorporated Region Total" = "#E3C9E3",
"State Total" = "#8CC63E",
"Incorporated State Total" = "#A9D46E",
"Unincorporated State Total" = "#E2F1CF"
)
This is the first section of the document that renders text - a brief description of what the data is. It includes some text formatting, an inline image and a link to the data. No need to type this all out, it is already in the document and looks like this:
Now lets start processing the OFM data so that we can create the visuals we want in our document. The steps we are going to follow are:
Base R’s function download.file
can be used to copy a file from an url to a local copy on your computer. There are several parmeters you can use but the most common ones you need are:
download.file(url = ofm.url, destfile = "working.xlsx", quiet = TRUE, mode = "wb")
ofm.pop.file <- paste0(getwd(),"/working.xlsx")
ofm.pop <- as_tibble(read.xlsx(ofm.pop.file, detectDates = TRUE, skipEmptyRows = TRUE, startRow = 5, colNames = TRUE))
Let’s take a look at ofm.pop
in your environment. Looks like a wide format table with extra lines and other things that are not useful for plotting in ggplot2
. Let’s use dplyr
to clean things up.
First let’s trim it down to only include places in our region or for the state as a whole. This is done using filter
in dplyr
. We will only keep rows where the value in County
is in our list of counties for the region along with the State.
ofm.pop <- ofm.pop %>%
filter(County %in% c(psrc_counties,"State"))
Now let’s make it long form by transforming any column
with the word Population
in it and keeping the rest. We do this using pivot_longer
.
ofm.pop <- ofm.pop %>%
pivot_longer(cols=contains("Population"), names_to="Year", values_to="Estimate")
The column Line
is only useful to OFM and used in Excel, let’s get rid of it using an inverse
of select
. You get everything but a specific column by using -
before the column name.
ofm.pop <- ofm.pop %>%
select(-Line)
Getting closer! The Year
column from OFM is really long and we only want the integer that represents the actual year year so we can use it for plotting and summarizing. This is where stringr
is great - it makes pattern recognition and string replacement a snap! At the same time, let’s use it to get rid of the word (part)
in the Jurisdiction label - we want Bothell to be Bothell since we have the county identifier already.
ofm.pop <- ofm.pop %>%
mutate(Year = str_replace(Year, ".Population.*", ""), Jurisdiction = str_replace(Jurisdiction, " \\(part\\)", ""))
So close. When we read that file from OFM, things like the population came in as a character
which is no good for doing analysis. We can use mutate
to change the format of a column (or multiple columns) in dplyr
. If you use multiple columns, be sure to use the across
function.
ofm.pop <- ofm.pop %>%
mutate(across(c('Filter','Year','Estimate'), as.numeric))
Now that we stepped through all that, let’s go ahead and copy and paste the following code to to our ofm_data_prep
chunk so that we can move along. If you have questions about what the rest of the code is doing, come visit me during office hours and we can chat it over.
# This code chunk will download the OFM file from the web, put a temp copy on your hard drive, clean it up for analysis and then remove the temporary file afterwards.
download.file(ofm.url, "working.xlsx", quiet = TRUE, mode = "wb")
ofm.pop.file <- paste0(getwd(),"/working.xlsx")
ofm.pop <- as_tibble(read.xlsx(ofm.pop.file, detectDates = TRUE, skipEmptyRows = TRUE, startRow = 5, colNames = TRUE))
# Get rid of things we don't need, make it long form and clean it up for use in plot making
ofm.pop <- ofm.pop %>%
filter(County %in% c(psrc_counties,"State")) %>%
pivot_longer(cols=contains("Population"), names_to="Year", values_to="Estimate") %>%
select(-Line) %>%
mutate(Year = str_replace(Year, ".Population.*", ""), Jurisdiction = str_replace(Jurisdiction, " \\(part\\)", "")) %>%
mutate(across(c('Filter','Year','Estimate'), as.numeric))
# Create a Regional Summary by Filter Type and then Join to Full OFM tibble
region.pop <- ofm.pop %>%
filter(Filter <= 3) %>%
select(Filter,Year, Estimate) %>%
group_by(Filter,Year) %>%
summarize_all(sum) %>%
mutate(County = "Region") %>%
mutate(Jurisdiction = "Region Total") %>%
mutate(Jurisdiction = ifelse(Filter == 2, "Unincorporated Region Total", Jurisdiction)) %>%
mutate(Jurisdiction = ifelse(Filter == 3, "Incorporated Region Total", Jurisdiction))
# Add the regional results to the OFM full tibble
ofm.pop <- bind_rows(ofm.pop,region.pop)
# Add a column to place cities in population bins
ofm.pop <- ofm.pop %>%
mutate(PopRange = case_when(
Estimate < 5000 ~ 1,
Estimate >= 5000 & Estimate <10000 ~ 2,
Estimate >= 10000 & Estimate < 25000 ~ 3,
Estimate >= 25000 & Estimate < 50000 ~ 4,
Estimate >= 50000 & Estimate < 75000 ~ 5,
Estimate >= 75000 ~ 6) %>%
as.factor() %>%
structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
)
# Remove the temporary Excel File that was downloaded and remove unnecessary data from memory
file.remove(ofm.pop.file)
rm(region.pop)
The inline text of our document is interesting if it pulls out some interesting facts from our data and displays them. We want the document to adapt to changes in the data, so we are trying to be as programmatic as possible. Since each year new data is released, let’s setup the document to get values programmatically from the data based on the first and last year in the data. This could all be done inline but I sometimes find it easier to store things in a variable first to use later.
We have a function written in functions
that takes a tibble, Jurisdiction name and a year and returns the population as a numeric value. Let’s run it for the State
and Region
for the first and last years data and store it in variables.
First let’s find the first
and last
year in the data we downloaded from OFM.
first.year <- min(ofm.pop %>% select(Year))
last.year<- max(ofm.pop %>% select(Year))
Next let’s use our return_population_estimate
function to store the Region
and Statewide
population estimates for our first
and last
data years.
first.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", first.year)
last.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", last.year)
first.year.pop.state <- return_population_estimate(ofm.pop, "State Total", first.year)
last.year.pop.state <- return_population_estimate(ofm.pop, "State Total", last.year)
And finally let’s calculate the regional share
of the statewide population in the first
and last
year of the data
first.year.region.share <- first.year.pop.region / first.year.pop.state
last.year.region.share <- last.year.pop.region / last.year.pop.state
Just in case you need the full region_state_data
chunk, here it is to copy and paste in to your code.
# Determine the first and last year of the data
first.year <- min(ofm.pop %>% select(Year))
last.year<- max(ofm.pop %>% select(Year))
# Store the state and region population data into variables using our custom function
first.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", first.year)
last.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", last.year)
first.year.pop.state <- return_population_estimate(ofm.pop, "State Total", first.year)
last.year.pop.state <- return_population_estimate(ofm.pop, "State Total", last.year)
# Store the regional share of the statewide population into variables
first.year.region.share <- first.year.pop.region / first.year.pop.state
last.year.region.share <- last.year.pop.region / last.year.pop.state
Now that we have calculated the values needed for our inline text, we can now create and display our line chart for the region and state using our create_line_chart
function.
First we need to filter our data to only include the state and regional values
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c("Region Total","State Total"))
Now that we have filtered the data down, we can create the line chart
using our custom function.
total.region <- create_line_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.ltype = "solid", w.lwidth = 1.0, w.palette = psrc_colors)
Just in case you need the full region_state_plot
chunk, here it is to copy and paste in to your code
# Filter Data
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c("Region Total","State Total"))
# Create Line Chart
total.region <- create_line_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.ltype = "solid", w.lwidth = 1.0, w.palette = psrc_colors)
rm(working.data)
total.region
Now we can pull out some interesting facts on population in and out of incorporated areas from our data and display them as a stacked bar. Like previously, we will setup the document to get values programmatically from the data based on the first and last year in the data.
Using our same return_population_estimate
function but this time we will do it for the incorporated and unincorporated areas.
First for incorporated areas
first.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", first.year)
last.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", last.year)
Next for unincorporated areas
first.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", first.year)
last.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", last.year)
And finally let’s calculate the incorporated share
first.year.region.inc.share <- first.year.pop.region.inc / first.year.pop.region
last.year.region.inc.share <- last.year.pop.region.inc / last.year.pop.region
Just in case you need the full incorporated_data
chunk, here it is to copy and paste in to your code.
# Calculate Incorporated Totals
first.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", first.year)
last.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", last.year)
# Calculate Unincorporated Totals
first.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", first.year)
last.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", last.year)
# Calculate the Share of Population in Incorporated Areas
first.year.region.inc.share <- first.year.pop.region.inc / first.year.pop.region
last.year.region.inc.share <- last.year.pop.region.inc / last.year.pop.region
Now that we have calculated the values needed for our inline text, we can now create and display our chart for the region incorporated and unincorporated areas using our create_bar_chart
function.
First we need to filter our data to only include the regional totals by incorporated or unincorporated values.
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c("Incorporated Region Total", "Unincorporated Region Total"))
Now that we have filtered the data down, we can create the stacked bar chart
using our custom function.
incorporated.region <- create_bar_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.bartype = "stack", w.transparent = 1.0, w.palette = psrc_colors, x.type = "continuous")
Just in case you need the full incorporated_plot
chunk, here it is to copy and paste in to your code
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c("Incorporated Region Total", "Unincorporated Region Total"))
incorporated.region <- create_bar_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.bartype = "stack", w.transparent = 1.0, w.palette = psrc_colors, x.type = "continuous")
rm(working.data)
incorporated.region
Now we can pull out some interesting facts on population in cities from our data and display them as a facet wrapped bar charts. Like previously, we will setup the document to get values programmatically from the data based on the first and last year in the data.
First we need to filter the data to a single year as there are 82 cities and towns in the region
working.data <- ofm.pop %>%
filter(Filter ==4 & Year == last.year) %>%
arrange(desc(Estimate))
# There are some places (Auburn, Bothell, etc.) that are in two counties so we should aggregate to get to one line per city
working.data <- working.data %>%
select(Jurisdiction, Estimate) %>%
group_by(Jurisdiction) %>%
summarize_all(sum) %>%
mutate(PopRange = case_when(
Estimate < 5000 ~ 1,
Estimate >= 5000 & Estimate <10000 ~ 2,
Estimate >= 10000 & Estimate < 25000 ~ 3,
Estimate >= 25000 & Estimate < 50000 ~ 4,
Estimate >= 50000 & Estimate < 75000 ~ 5,
Estimate >= 75000 ~ 6) %>%
as.factor() %>%
structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
)
num.juris <- working.data %>% select(Jurisdiction) %>% count() %>% as.numeric
Now we can find the biggest city
biggest.city <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Jurisdiction) %>% as.character
biggest.pop <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Estimate) %>% as.numeric
And finally let’s calculate the smallest city
smallest.city <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Jurisdiction) %>% as.character
smallest.pop <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Estimate) %>% as.numeric
Just in case you need the full cities_data
chunk, here it is to copy and paste in to your code.
# Filter to Cities
working.data <- ofm.pop %>%
filter(Filter ==4 & Year == last.year) %>%
arrange(desc(Estimate))
# There are some places (Auburn, Bothell, etc.) that are in two counties so we should aggregate to get to one line per city
working.data <- working.data %>%
select(Jurisdiction, Estimate) %>%
group_by(Jurisdiction) %>%
summarize_all(sum) %>%
mutate(PopRange = case_when(
Estimate < 5000 ~ 1,
Estimate >= 5000 & Estimate <10000 ~ 2,
Estimate >= 10000 & Estimate < 25000 ~ 3,
Estimate >= 25000 & Estimate < 50000 ~ 4,
Estimate >= 50000 & Estimate < 75000 ~ 5,
Estimate >= 75000 ~ 6) %>%
as.factor() %>%
structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
)
num.juris <- working.data %>% select(Jurisdiction) %>% count() %>% as.numeric
biggest.city <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Jurisdiction) %>% as.character
biggest.pop <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Estimate) %>% as.numeric
smallest.city <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Jurisdiction) %>% as.character
smallest.pop <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Estimate) %>% as.numeric
Now that we have calculated the values needed for our inline text, we can now create and display our chart for all cities using our create_facet_bar_chart
function. Since we already had to filter things on this data set, this chunk down’t require any further processing of the data.
region.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")
rm(working.data)
region.cities
Just in case you need the full cities_plot
chunk, here it is to copy and paste in to your code
region.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")
rm(working.data)
region.cities
What if we want to summarize the data for each County in the same way we just did for the region? We could simply copy and paste the code from the region another four times but that will make the document long, takes a lot of time but more importantly what if we make a small tweak to a code chunk? We would have to change it in multiple places. This is where the concept of a child
in knitr
comes in. Think of a child
as a function that creates your output. We can iterate over a list and run the child
code on each of those. Now if we want to add 2, 3, 6 or 10 counties to our summary we can do it with any adjustment to our list, not with new code!
Let’s start with writing our child
and then we can work on the list creation. A child
in markdown is a seperate file (think of our source
example from earlier). I provided a template for us to use in the class documents called ofm_trend_county_child.Rmd
. Go ahead and open that file in RStudio now.
We are going to structure our child to re-create the data we saw for the region:
First let’s get some county level summary data for our text using that same return_population_estimate
function we used before.
first.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), first.year)
last.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), last.year)
Now let’s create our County Line Chart by filtering
to our current county and running the create_line_chart
function. I decided to change the line type
in this to be twodash
.
working.data <- ofm.pop %>%
filter(Jurisdiction == paste0(current.county, " County"))
total.county <- create_line_chart(working.data, "Year", "Estimate", "Jurisdiction", "twodash", 1.5, psrc_colors)
Now let’s create our stacked bar chart for incorporated/unincorporated areas.
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c(paste0("Incorporated ",current.county, " County"), paste0("Unincorporated ",current.county, " County")))
incorporation.county <- create_bar_chart(working.data, "Year", "Estimate", "Jurisdiction", "stack", 1.0, psrc_colors, "continuous")
And finally let’s create the fact charts for city population in our specific county.
working.data <- ofm.pop %>%
filter(Filter ==4 & County == current.county & Year == last.year) %>% arrange(desc(Estimate))
county.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")
We now have all the pieces to work with to create our county specific figures. Here is all thee code in one place to copy and paste if needed
# Population Summary Data for use in markdown text
first.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), first.year)
last.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), last.year)
# Create Line Chart for Total County Population
working.data <- ofm.pop %>%
filter(Jurisdiction == paste0(current.county, " County"))
total.county <- create_line_chart(working.data, "Year", "Estimate", "Jurisdiction", "twodash", 1.5, psrc_colors)
# Create Stacked Bar Chart for County Population by Incorporated/Unincorporated Population
working.data <- ofm.pop %>%
filter(Jurisdiction %in% c(paste0("Incorporated ",current.county, " County"), paste0("Unincorporated ",current.county, " County")))
incorporation.county <- create_bar_chart(working.data, "Year", "Estimate", "Jurisdiction", "stack", 1.0, psrc_colors, "continuous")
# Create Bar Chart for City Population by County
working.data <- ofm.pop %>%
filter(Filter ==4 & County == current.county & Year == last.year) %>%
arrange(desc(Estimate))
county.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")
Now that we have our child
created and saved, let’s create the list to run to create the county by county summary. knitr
will run the child we document we call for every item in our list. It basically returns a list of our output that we store in a variable out
to show. Here is the code we will run to finish off our document
out <- NULL
for (current.county in psrc_counties) {
out <- c(out, knit_child("ofm_trend_county_child.Rmd"))
}
Try out the functionality of our approach. Let’s add Thurston County
to our region and re-run things to see what happens. To do this, adjust psrc_counties
to include Thurston COunty and then knit
. Here is the code change:
psrc_counties <- c("King","Kitsap","Pierce","Snohomish", "Thurston")
This is the end, beautiful friend. This is the end, my only friend the end. I hope you have enjoyed it and it inspires you to try and create many more R Markdown documents! If you ever want help, just let me know.
My office hours are:
Craig’ R Office Hours: every Wednesday from 12pm to 1pm
Make an appointment ahead of time or simply drop me a Teams chat and we can work together on any and all things R!