Welcome to our second R Markdown lesson. This lesson is going to create an html output file that summarizes population data from the Office of Financial Management (OFM). This lesson builds off of your previous lessons in dplyr and ggplot2 so hope you remember all that great stuff! I am also going to show you a few tricks to make this look more PSRC appropriate.

OFM Trend

To start this lesson, first let’s clear out the environment and restart the R session. It is a good idea to clear out the environment and restart R when you switch projects. If you don’t, all your previous data will remain in memory as well as your libraries. It can lead to issues so it is best to start fresh - which also happens if you just close and reopen RStudio. To clear the environment and restart R do the following:

Clear Environment

Restart R

Now let’s load the file ofm_trend.Rmd that you already downloaded into RStudio as our starting point. This RMD document should look somewhat familiar to the one we just did with Bob. It has a yaml header, a code chunkand some inline text. Lets take a look at the components quickly.

YAML Header

The header tells R that we want an HTML document that uses some custom formatting provided in the file styles.css. We also want a table of contents that floats and only shows items down to Heading Level 2.

---
title: “OFM Population Estimates”
output:
html_document:
theme: cerulean
css: styles.css
toc: TRUE
toc_float: TRUE
toc_depth: 2
---

Code Chunk for Initial Setup

This chunk of code loads all the libraries that we will be using in the lesson and provides some basic inputs that are used throughout the script. All the libraries we are using today were used in the previous two R sessions, the only new one for today is knitr.

There are a couple new concepts that we will touch on today but not spend a lot of time looking at - how to download an Excel file directly from a website and also how to use other files that might contain functions, lists, inputs etc. in your code without having to re-type them.

# This code "chunk" contains all the libraries/packages that are used in the overall script as well basic inputs that we might want to change

# Packages for Data Cleaning/Processing
library(tidyr)
library(dplyr)
library(stringr)
library(openxlsx)

# Packages for Table and Chart creation
library(ggplot2)
library(scales)
library(plotly)

# Packages used to create markdown document (needed to knit children)
library(knitr)

# Some basic variables that will be used by the script
psrc_counties <- c("King","Kitsap","Pierce","Snohomish")

ofm.url <- 'https://www.ofm.wa.gov/sites/default/files/public/dataresearch/pop/april1/ofm_april1_population_final.xlsx'

# Read in Functions and color palette
source(file.path(getwd() , 'functions.R'))
source(file.path(getwd() , 'psrc_palette.R'))

New Content - sourcing files

When we sourcea file, we are effectively telling R to run that file and make all of its contents available in the current R session. In this example, we source two files:

functions.R
psrc_palette.R

Functions

The functions.R file has four custom functions that were written to create:

A line chart
Bar Chart
Facet Wrapped Bar Chart
Return an estimated value from a tibble

I write functions like this so that I can make my charts and figures look consistent without having to keep copying and pasting things. Remember the golden rule on functions - if you copy the same line more than twice, write a function instead! Let’s open funcitons.R in RStudio so you can see what it looks like using:

I have also placed the code here as well:

# Useful Functions

create_line_chart <- function(w.data, w.x, w.y, w.color, w.ltype, w.lwidth, w.palette) {
  
  y.max <- 1.25 * max(w.data[w.y])
  x.breaks <- unique(w.data %>% pull(w.x))
  
  g <-  ggplotly(ggplot(data=w.data, 
                        aes(x = get(eval(w.x)), 
                            y = get(eval(w.y)), 
                            color = get(eval(w.color)), 
                            group=1, 
                            text = paste0("<b>Year: </b>",  get(eval(w.x)), "<br>","<b>Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>")))  + 
                   geom_line(linetype = w.ltype,
                             size = w.lwidth) +
                   scale_y_continuous(labels = label_comma(), limits = c(0,y.max)) +
                   scale_x_continuous(breaks= x.breaks) +
                   scale_color_manual(values = w.palette) +
                   labs(x = w.x, y = NULL) +
                   theme(plot.title = element_text(size = 10, face = 'bold'),
                         axis.text.x = element_text(angle = 0,
                                                    hjust = 1, 
                                                    vjust = 1,
                                                    family = 'Comic Sans MS'),
                         axis.ticks.x = element_blank(),
                         axis.line.x = element_blank(),
                         axis.line.y = element_line(colour="#BBBDC0",size = 0.25),
                         panel.background = element_blank(),
                         panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
                         panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
                         panel.grid.major.x = element_blank(),
                         panel.grid.minor.x = element_blank(),
                         text = element_text(family = "Segoe UI"),
                         legend.position = "bottom",
                         legend.title = element_blank()),
                 tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25), hovermode = "x")
  
  return(g)
  
}

create_bar_chart <- function(w.data, w.x, w.y, w.color, w.bartype, w.transparent, w.palette, x.type) {
  
  if (x.type == "discrete") {
  
    g <-  ggplotly(ggplot(data = w.data,
                          aes(x = get(eval(w.x)), 
                              y = get(eval(w.y)), 
                              fill = get(eval(w.color)), 
                              group=1, 
                              text = paste0("<b>", get(eval(w.color))," Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
                    geom_col(color = "black",
                              alpha = w.transparent,
                              position = w.bartype) +
                    scale_y_continuous(labels = label_comma()) +
                    labs(x = w.x, y = NULL) +
                    theme(plot.title = element_text(size = 10, face = 'bold'),
                         axis.text.x = element_text(angle = 0,
                                                    hjust = 1, 
                                                    vjust = 1,
                                                    family = 'Comic Sans MS'),
                         axis.ticks.x = element_blank(),
                         axis.line = element_blank(),
                         panel.background = element_blank(),
                         panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
                         panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
                         panel.grid.major.x = element_blank(),
                         panel.grid.minor.x = element_blank(),
                         text = element_text(family = "Segoe UI"),
                         legend.position = "bottom",
                         legend.title = element_blank()),
                  tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25))
    
  } else {
    x.breaks <- unique(w.data %>% pull(w.x))
    g <-  ggplotly(ggplot(data = w.data,
                          aes(x = get(eval(w.x)), 
                              y = get(eval(w.y)), 
                              fill = get(eval(w.color)), 
                              group=1, 
                              text = paste0("<b>", get(eval(w.color))," Population: </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
                     geom_col(color = "black",
                              alpha = w.transparent,
                              position = w.bartype) +
                     scale_y_continuous(labels = label_comma()) +
                     scale_fill_manual(values = w.palette) +
                     scale_x_continuous(breaks= x.breaks) +
                     labs(x = w.x, y = NULL) +
                     theme(plot.title = element_text(size = 10, face = 'bold'),
                           axis.text.x = element_text(angle = 0,
                                                      hjust = 1, 
                                                      vjust = 1,
                                                      family = 'Comic Sans MS'),
                           axis.ticks.x = element_blank(),
                           axis.line = element_blank(),
                           panel.background = element_blank(),
                           panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
                           panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
                           panel.grid.major.x = element_blank(),
                           panel.grid.minor.x = element_blank(),
                           text = element_text(family = "Segoe UI"),
                           legend.position = "bottom",
                           legend.title = element_blank()),
                   tooltip = c("text")) %>% layout(legend = list(orientation = "h", xanchor = "center", x = 0.5, y = -0.25))    
    
    
  }
  
  return(g)
  
}

create_facet_bar_chart <- function(w.data, w.x, w.y, w.color, w.vars, w.scales) {
  
  g <-  ggplotly(ggplot(data = w.data,
                        aes(x = reorder(get(eval(w.x)), -get(eval(w.y))), 
                            y = get(eval(w.y)), 
                            fill = get(eval(w.color)),
                            text = paste0("<b>", get(eval(w.color)), ": </b>", prettyNum(round(get(eval(w.y)), 0), big.mark = ","), "<br>"))) +
                             geom_col(
                               color = "black",
                               alpha = 1.0,
                               position = "dodge") +
                             labs(x = NULL, y = NULL) +
                             scale_y_continuous(labels = label_comma() ) +
                             theme(plot.title = element_text(size = 10, face = 'bold'),
                                   axis.text.x = element_blank(),
                                   axis.ticks.x = element_blank(),
                                   axis.line = element_blank(),
                                   panel.background = element_blank(),
                                   panel.grid.major.y = element_line(colour="#BBBDC0",size = 0.25),
                                   panel.grid.minor.y = element_line(colour="#BBBDC0",size = 0.25),
                                   panel.grid.major.x = element_blank(),
                                   panel.grid.minor.x = element_blank(),
                                   text = element_text(family = "Segoe UI"),
                                   legend.position = "none",
                                   legend.title = element_blank())+
                             facet_wrap(vars(get(eval(w.vars))), scales = w.scales) +
                             theme(legend.position = "none"),
                 tooltip = c("text"))
    
  return(g)
  
}

  
return_population_estimate <- function(data, place, year) {
  
  pop <- data %>% filter(Jurisdiction == place & Year == year) %>% select(Estimate) %>% as.numeric
  return(pop)
  
}

Palette

palette.R has information on what colors to use for different specific names and categories. It is a named vector where if you pass it the name (the value on the left side of the equal sign) it returns the HEX color (the value on the right side of the equal sign). In this code, Kitsap County will always be colored with hex code #F4835E.

psrc_colors <- c(
  "King County" = "#AD5CAB",
  "Incorporated King County" = "#C388C2",
  "Unincorporated King County" = "#E3C9E3",
  "Kitsap County" = "#F4835E",
  "Incorporated Kitsap County" = "#F7A489",
  "Unincorporated Kitsap County" = "#FBD6C9",
  "Pierce County" = "#A9D46E",
  "Incorporated Pierce County" = "#C0E095",
  "Unincorporated Pierce County" = "#E2F1CF",
  "Snohomish County" = "#40BDB8",
  "Incorporated Snohomish County" = "#73CFCB",
  "Unincorporated Snohomish County" = "#BFE9E7",
  "Region Total" = "#91268F",
  "Incorporated Region Total" = "#AD5CAB",
  "Unincorporated Region Total" = "#E3C9E3",
  "State Total" = "#8CC63E",
  "Incorporated State Total" = "#A9D46E",
  "Unincorporated State Total" = "#E2F1CF"
)

Inline Text

This is the first section of the document that renders text - a brief description of what the data is. It includes some text formatting, an inline image and a link to the data. No need to type this all out, it is already in the document and looks like this:

OFM Data Prep

Now lets start processing the OFM data so that we can create the visuals we want in our document. The steps we are going to follow are:

Download the data.
Clean, format and reshape the data
Add regional data

Download from a URL

Base R’s function download.file can be used to copy a file from an url to a local copy on your computer. There are several parmeters you can use but the most common ones you need are:

url (link to file to download)
destfile (name of file to save to on your computer)
quiet (TRUE turns off any warnings, progress bar, etc.)
mode (use of ‘wb’ is most commmon on windows machines as it is a binary file type)

download.file(url = ofm.url, destfile = "working.xlsx", quiet = TRUE, mode = "wb")
ofm.pop.file <- paste0(getwd(),"/working.xlsx")
ofm.pop <- as_tibble(read.xlsx(ofm.pop.file, detectDates = TRUE, skipEmptyRows = TRUE, startRow = 5, colNames = TRUE))

Let’s take a look at ofm.pop in your environment. Looks like a wide format table with extra lines and other things that are not useful for plotting in ggplot2. Let’s use dplyr to clean things up.

Cleanup with dplyr

First let’s trim it down to only include places in our region or for the state as a whole. This is done using filter in dplyr. We will only keep rows where the value in County is in our list of counties for the region along with the State.

ofm.pop <- ofm.pop %>%
  filter(County %in%  c(psrc_counties,"State"))

Now let’s make it long form by transforming any column with the word Population in it and keeping the rest. We do this using pivot_longer.

ofm.pop <- ofm.pop %>%
  pivot_longer(cols=contains("Population"), names_to="Year", values_to="Estimate")

The column Line is only useful to OFM and used in Excel, let’s get rid of it using an inverse of select. You get everything but a specific column by using - before the column name.

ofm.pop <- ofm.pop %>%
  select(-Line)

Getting closer! The Year column from OFM is really long and we only want the integer that represents the actual year year so we can use it for plotting and summarizing. This is where stringr is great - it makes pattern recognition and string replacement a snap! At the same time, let’s use it to get rid of the word (part) in the Jurisdiction label - we want Bothell to be Bothell since we have the county identifier already.

ofm.pop <- ofm.pop %>%
  mutate(Year = str_replace(Year, ".Population.*", ""), Jurisdiction = str_replace(Jurisdiction, " \\(part\\)", ""))

So close. When we read that file from OFM, things like the population came in as a character which is no good for doing analysis. We can use mutate to change the format of a column (or multiple columns) in dplyr. If you use multiple columns, be sure to use the across function.

ofm.pop <- ofm.pop %>%
  mutate(across(c('Filter','Year','Estimate'), as.numeric))

Final ofm_data_prep working chunk

Now that we stepped through all that, let’s go ahead and copy and paste the following code to to our ofm_data_prep chunk so that we can move along. If you have questions about what the rest of the code is doing, come visit me during office hours and we can chat it over.

# This code chunk will download the OFM file from the web, put a temp copy on your hard drive, clean it up for analysis and then remove the temporary file afterwards.

download.file(ofm.url, "working.xlsx", quiet = TRUE, mode = "wb")
ofm.pop.file <- paste0(getwd(),"/working.xlsx")

ofm.pop <- as_tibble(read.xlsx(ofm.pop.file, detectDates = TRUE, skipEmptyRows = TRUE, startRow = 5, colNames = TRUE))

# Get rid of things we don't need, make it long form and clean it up for use in plot making
ofm.pop <- ofm.pop %>%
  filter(County %in%  c(psrc_counties,"State")) %>%
  pivot_longer(cols=contains("Population"), names_to="Year", values_to="Estimate") %>%
  select(-Line) %>%
  mutate(Year = str_replace(Year, ".Population.*", ""), Jurisdiction = str_replace(Jurisdiction, " \\(part\\)", "")) %>%
  mutate(across(c('Filter','Year','Estimate'), as.numeric))

# Create a Regional Summary by Filter Type and then Join to Full OFM tibble
region.pop <- ofm.pop %>%
  filter(Filter <= 3) %>%
  select(Filter,Year, Estimate) %>%
  group_by(Filter,Year) %>%
  summarize_all(sum) %>%
  mutate(County = "Region") %>%
  mutate(Jurisdiction = "Region Total") %>%
  mutate(Jurisdiction = ifelse(Filter == 2, "Unincorporated Region Total", Jurisdiction)) %>%
  mutate(Jurisdiction = ifelse(Filter == 3, "Incorporated Region Total", Jurisdiction))

# Add the regional results to the OFM full tibble
ofm.pop <- bind_rows(ofm.pop,region.pop)

# Add a column to place cities in population bins
ofm.pop <- ofm.pop %>%
  mutate(PopRange = case_when(
        Estimate < 5000 ~ 1,
        Estimate >= 5000 & Estimate <10000 ~ 2,
        Estimate >= 10000 & Estimate < 25000 ~ 3,
        Estimate >= 25000 & Estimate < 50000 ~ 4,
        Estimate >= 50000 & Estimate < 75000 ~ 5,
        Estimate >= 75000 ~ 6) %>%
  as.factor() %>%
    structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
  )

# Remove the temporary Excel File that was downloaded and remove unnecessary data from memory
file.remove(ofm.pop.file)
rm(region.pop)

Region and State

The inline text of our document is interesting if it pulls out some interesting facts from our data and displays them. We want the document to adapt to changes in the data, so we are trying to be as programmatic as possible. Since each year new data is released, let’s setup the document to get values programmatically from the data based on the first and last year in the data. This could all be done inline but I sometimes find it easier to store things in a variable first to use later.

Data Calculations

We have a function written in functions that takes a tibble, Jurisdiction name and a year and returns the population as a numeric value. Let’s run it for the State and Region for the first and last years data and store it in variables.

First let’s find the first and last year in the data we downloaded from OFM.

first.year <- min(ofm.pop %>% select(Year))
last.year<- max(ofm.pop %>% select(Year))

Next let’s use our return_population_estimate function to store the Region and Statewide population estimates for our first and last data years.

first.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", first.year)
last.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", last.year)

first.year.pop.state <- return_population_estimate(ofm.pop, "State Total", first.year)
last.year.pop.state <- return_population_estimate(ofm.pop, "State Total", last.year)

And finally let’s calculate the regional share of the statewide population in the first and last year of the data

first.year.region.share <- first.year.pop.region / first.year.pop.state
last.year.region.share <- last.year.pop.region / last.year.pop.state

Final region_state_data working chunk

Just in case you need the full region_state_data chunk, here it is to copy and paste in to your code.

# Determine the first and last year of the data
first.year <- min(ofm.pop %>% select(Year))
last.year<- max(ofm.pop %>% select(Year))

# Store the state and region population data into variables using our custom function
first.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", first.year)
last.year.pop.region <- return_population_estimate(ofm.pop, "Region Total", last.year)

first.year.pop.state <- return_population_estimate(ofm.pop, "State Total", first.year)
last.year.pop.state <- return_population_estimate(ofm.pop, "State Total", last.year)

# Store the regional share of the statewide population into variables
first.year.region.share <- first.year.pop.region / first.year.pop.state
last.year.region.share <- last.year.pop.region / last.year.pop.state

Line Chart of Data

Now that we have calculated the values needed for our inline text, we can now create and display our line chart for the region and state using our create_line_chart function.

First we need to filter our data to only include the state and regional values

working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c("Region Total","State Total"))

Now that we have filtered the data down, we can create the line chart using our custom function.

total.region <- create_line_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.ltype = "solid", w.lwidth = 1.0, w.palette = psrc_colors)

Final region_state_plot working chunk

Just in case you need the full region_state_plot chunk, here it is to copy and paste in to your code

# Filter Data
working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c("Region Total","State Total"))

# Create Line Chart
total.region <- create_line_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.ltype = "solid", w.lwidth = 1.0, w.palette = psrc_colors)

rm(working.data)

total.region

Incorporated Area

Now we can pull out some interesting facts on population in and out of incorporated areas from our data and display them as a stacked bar. Like previously, we will setup the document to get values programmatically from the data based on the first and last year in the data.

Data Calculations

Using our same return_population_estimate function but this time we will do it for the incorporated and unincorporated areas.

First for incorporated areas

first.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", first.year)
last.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", last.year)

Next for unincorporated areas

first.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", first.year)
last.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", last.year)

And finally let’s calculate the incorporated share

first.year.region.inc.share <- first.year.pop.region.inc / first.year.pop.region
last.year.region.inc.share <- last.year.pop.region.inc / last.year.pop.region

Final incorporated_data working chunk

Just in case you need the full incorporated_data chunk, here it is to copy and paste in to your code.

# Calculate Incorporated Totals
first.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", first.year)
last.year.pop.region.inc <- return_population_estimate(ofm.pop, "Incorporated Region Total", last.year)

# Calculate Unincorporated Totals
first.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", first.year)
last.year.pop.region.uninc <- return_population_estimate(ofm.pop, "Unincorporated Region Total", last.year)

# Calculate the Share of Population in Incorporated Areas
first.year.region.inc.share <- first.year.pop.region.inc / first.year.pop.region
last.year.region.inc.share <- last.year.pop.region.inc / last.year.pop.region

Stacked Bar Chart of Data

Now that we have calculated the values needed for our inline text, we can now create and display our chart for the region incorporated and unincorporated areas using our create_bar_chart function.

First we need to filter our data to only include the regional totals by incorporated or unincorporated values.

working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c("Incorporated Region Total", "Unincorporated Region Total"))

Now that we have filtered the data down, we can create the stacked bar chart using our custom function.

incorporated.region <- create_bar_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.bartype = "stack", w.transparent = 1.0, w.palette = psrc_colors, x.type = "continuous")

Final incorporated_plot working chunk

Just in case you need the full incorporated_plot chunk, here it is to copy and paste in to your code

working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c("Incorporated Region Total", "Unincorporated Region Total"))

incorporated.region <- create_bar_chart(w.data = working.data, w.x = "Year", w.y = "Estimate", w.color = "Jurisdiction", w.bartype = "stack", w.transparent = 1.0, w.palette = psrc_colors, x.type = "continuous")

rm(working.data)

incorporated.region

Cities

Now we can pull out some interesting facts on population in cities from our data and display them as a facet wrapped bar charts. Like previously, we will setup the document to get values programmatically from the data based on the first and last year in the data.

Data Calculations

First we need to filter the data to a single year as there are 82 cities and towns in the region

working.data <- ofm.pop %>%
  filter(Filter ==4 & Year == last.year) %>%
  arrange(desc(Estimate))

# There are some places (Auburn, Bothell, etc.) that are in two counties so we should aggregate to get to one line per city
working.data <- working.data %>%
  select(Jurisdiction, Estimate) %>%
  group_by(Jurisdiction) %>%
  summarize_all(sum) %>%
  mutate(PopRange = case_when(
    Estimate < 5000 ~ 1,
    Estimate >= 5000 & Estimate <10000 ~ 2,
    Estimate >= 10000 & Estimate < 25000 ~ 3,
    Estimate >= 25000 & Estimate < 50000 ~ 4,
    Estimate >= 50000 & Estimate < 75000 ~ 5,
    Estimate >= 75000 ~ 6) %>%
  as.factor() %>%
    structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
  )

num.juris <- working.data %>% select(Jurisdiction) %>% count() %>% as.numeric

Now we can find the biggest city

biggest.city <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Jurisdiction) %>% as.character
biggest.pop <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Estimate) %>% as.numeric

And finally let’s calculate the smallest city

smallest.city <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Jurisdiction) %>% as.character
smallest.pop <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Estimate) %>% as.numeric

Final cities_data working chunk

Just in case you need the full cities_data chunk, here it is to copy and paste in to your code.

# Filter to Cities
working.data <- ofm.pop %>%
  filter(Filter ==4 & Year == last.year) %>%
  arrange(desc(Estimate))

# There are some places (Auburn, Bothell, etc.) that are in two counties so we should aggregate to get to one line per city
working.data <- working.data %>%
  select(Jurisdiction, Estimate) %>%
  group_by(Jurisdiction) %>%
  summarize_all(sum) %>%
  mutate(PopRange = case_when(
    Estimate < 5000 ~ 1,
    Estimate >= 5000 & Estimate <10000 ~ 2,
    Estimate >= 10000 & Estimate < 25000 ~ 3,
    Estimate >= 25000 & Estimate < 50000 ~ 4,
    Estimate >= 50000 & Estimate < 75000 ~ 5,
    Estimate >= 75000 ~ 6) %>%
  as.factor() %>%
    structure(levels = c("less than 5k","5k to 10k","10k to 25k","25k to 50k", "50k to 75k", "more than 75k"))
  )

num.juris <- working.data %>% select(Jurisdiction) %>% count() %>% as.numeric
biggest.city <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Jurisdiction) %>% as.character
biggest.pop <- working.data %>% filter(Estimate == max(Estimate)) %>% select(Estimate) %>% as.numeric

smallest.city <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Jurisdiction) %>% as.character
smallest.pop <- working.data %>% filter(Estimate == min(Estimate)) %>% select(Estimate) %>% as.numeric

Facet Wrap Bar Charts of Data

Now that we have calculated the values needed for our inline text, we can now create and display our chart for all cities using our create_facet_bar_chart function. Since we already had to filter things on this data set, this chunk down’t require any further processing of the data.

region.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")

rm(working.data)

region.cities

Final cities_plot working chunk

Just in case you need the full cities_plot chunk, here it is to copy and paste in to your code

region.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")

rm(working.data)

region.cities

County Child

What if we want to summarize the data for each County in the same way we just did for the region? We could simply copy and paste the code from the region another four times but that will make the document long, takes a lot of time but more importantly what if we make a small tweak to a code chunk? We would have to change it in multiple places. This is where the concept of a child in knitr comes in. Think of a child as a function that creates your output. We can iterate over a list and run the child code on each of those. Now if we want to add 2, 3, 6 or 10 counties to our summary we can do it with any adjustment to our list, not with new code!

Let’s start with writing our child and then we can work on the list creation. A child in markdown is a seperate file (think of our source example from earlier). I provided a template for us to use in the class documents called ofm_trend_county_child.Rmd. Go ahead and open that file in RStudio now.

Data Calculations

We are going to structure our child to re-create the data we saw for the region:

County Total
Population in Incorporated Areas
Population of Cities by County

First let’s get some county level summary data for our text using that same return_population_estimate function we used before.

first.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), first.year)
last.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), last.year)

Now let’s create our County Line Chart by filtering to our current county and running the create_line_chart function. I decided to change the line type in this to be twodash.

working.data <- ofm.pop %>%
  filter(Jurisdiction == paste0(current.county, " County"))

total.county <- create_line_chart(working.data, "Year", "Estimate", "Jurisdiction", "twodash", 1.5, psrc_colors)

Now let’s create our stacked bar chart for incorporated/unincorporated areas.

working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c(paste0("Incorporated ",current.county, " County"), paste0("Unincorporated ",current.county, " County")))

incorporation.county <- create_bar_chart(working.data, "Year", "Estimate", "Jurisdiction", "stack", 1.0, psrc_colors, "continuous")

And finally let’s create the fact charts for city population in our specific county.

working.data <- ofm.pop %>%
  filter(Filter ==4 & County == current.county & Year == last.year) %>% arrange(desc(Estimate))

county.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")

Final ofm_trend_county_child working chunk

We now have all the pieces to work with to create our county specific figures. Here is all thee code in one place to copy and paste if needed

# Population Summary Data for use in markdown text
first.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), first.year)
last.year.pop.county <- return_population_estimate(ofm.pop, paste0(current.county, " County"), last.year)

# Create Line Chart for Total County Population
working.data <- ofm.pop %>%
  filter(Jurisdiction == paste0(current.county, " County"))

total.county <- create_line_chart(working.data, "Year", "Estimate", "Jurisdiction", "twodash", 1.5, psrc_colors)

# Create Stacked Bar Chart for County Population by Incorporated/Unincorporated Population
working.data <- ofm.pop %>%
  filter(Jurisdiction %in% c(paste0("Incorporated ",current.county, " County"), paste0("Unincorporated ",current.county, " County")))

incorporation.county <- create_bar_chart(working.data, "Year", "Estimate", "Jurisdiction", "stack", 1.0, psrc_colors, "continuous")

# Create Bar Chart for City Population by County
working.data <- ofm.pop %>%
  filter(Filter ==4 & County == current.county & Year == last.year) %>%
  arrange(desc(Estimate))

county.cities <- create_facet_bar_chart(working.data, "Jurisdiction", "Estimate" , "Jurisdiction", "PopRange", "free")

Knit our County List

Now that we have our child created and saved, let’s create the list to run to create the county by county summary. knitr will run the child we document we call for every item in our list. It basically returns a list of our output that we store in a variable out to show. Here is the code we will run to finish off our document

out <- NULL

for (current.county in psrc_counties) {
  out <- c(out, knit_child("ofm_trend_county_child.Rmd"))
}

Add Thurston County

Try out the functionality of our approach. Let’s add Thurston County to our region and re-run things to see what happens. To do this, adjust psrc_counties to include Thurston COunty and then knit. Here is the code change:

psrc_counties <- c("King","Kitsap","Pierce","Snohomish", "Thurston")

The End

This is the end, beautiful friend. This is the end, my only friend the end. I hope you have enjoyed it and it inspires you to try and create many more R Markdown documents! If you ever want help, just let me know.

My office hours are:

Craig’ R Office Hours: every Wednesday from 12pm to 1pm

Make an appointment ahead of time or simply drop me a Teams chat and we can work together on any and all things R!

Lesson #2: OFM Population Estimates

OFM Trend

YAML Header

Code Chunk for Initial Setup

New Content - sourcing files

Functions

Palette

Inline Text

OFM Data Prep

Download from a URL

Cleanup with dplyr

Final ofm_data_prep working chunk

Region and State

Data Calculations

Final region_state_data working chunk

Line Chart of Data

Final region_state_plot working chunk

Incorporated Area

Data Calculations

Final incorporated_data working chunk

Stacked Bar Chart of Data

Final incorporated_plot working chunk

Cities

Data Calculations

Final cities_data working chunk

Facet Wrap Bar Charts of Data

Final cities_plot working chunk

County Child

Data Calculations

Final ofm_trend_county_child working chunk

Knit our County List

Add Thurston County

The End