psrcctpp basics
psrcctpp_basics.Rmd
Why CTPP data?
Although it is derived from the American Community Survey (ACS), CTPP includes data items that are not published in the ACS release, and at scales substantially smaller than ACS Public Use Microdata (PUMS). As with other ACS products, it is consistent across scales, supports longitudinal analysis, and includes margins of error. It is the most authoritative nationwide, Census Bureau source for transportation flow data in particular.
The primary drawback to CTPP data is the approximately 3-year lag to develop it; since it’s restricted to non-overlapping 5-year spans, it may involve a remove of nearly seven years from the conclusion of the first year in a reported span. Issues such as the transportation impacts accompanying the 2020 Covid pandemic have a long signature in the data (i.e. the next data without it would arrive around 2029).
Identify and retrieve CTPP data
Determine which table to request
In order to request data, as with the ACS, one must know the CTPP
table code (or variable codes). To assist in identifying this, the
package has a search function, ctpp_tblsearch()
.
It has two required parameters:
prefix - CTPP uses a pattern in table names: the first character denotes whether a table is originally reported data (“A”; confidential data is suppressed) or perturbed (“B”; some noise infusion instead of suppression), and the second indicates either residence geography (“1”), workplace geography (“2”), or flows (“3”, i.e. both residence and workplace geography). For example, a table for original workplace geography uses the prefix “A2”. The function argument can also use a question mark in place of these characters, if you’re not particular about one or the other (or either), i.e. “?3” would show all relevant flow tables, regardless of whether they are original or perturbed data.
“regex” - This is the regular expression search term to find in the table description. It is not case-sensitive.
(optional) year - i.e. specify which survey; otherwise defaults to latest (currently 2012-16)
In our example, we’re looking for a workplace-geography table showing mode, aka “means of transportation” in Census parlance.
shhh <- suppressPackageStartupMessages
shhh(library(psrcctpp))
shhh(library(magrittr))
shhh(library(dplyr))
shhh(library(stringr))
ctpp_tblsearch("?2", "means of transp.") %>%
mutate(desc=str_sub(description, 1L, 50L)) %>% # abbreviate to fit in frame
select(name, desc) %>% head()
## name desc
## <char> <char>
## 1: A202105 Workers by means of transportation
## 2: B206200 Aggregate travel time by means of transportation
## 3: B203208 Median household income by means of transportation
## 4: B203207 Workers by household size and by means of transpor
## 5: B206202 Mean travel time by means of transportation
## 6: B202200 Workers by minority status and by means of transpo
In our example, the first table (A202105) is the one we’re after. Notice, even if your attribute of interest is listed secondarily, there will typically be subtotals by that dimension alone within the table.
Get the data
Once we’ve identified the table code, we can now use the get_psrc_ctpp()
function to retrieve the data. Its parameters are:
-
scale - for residence or workplace tables, either
“county”, “place”, or “tract”
for flow tables, either “county-county”, “place-place”, “place-county”, “county-place”, or “tract-tract”
“block group” scale will be available starting with the 2017-21 survey.
- table_code - identifier (string) for the table you want (you can also request individual variables)
- dyear - aka data year, the last year of the CTPP span (i.e. 2016 for 2012-16)
By default, the resulting dataframe is restricted to geographies within the Central Puget Sound region (or flows with one end within the region). To specify a narrower set–or geographies outside the region–you can provide a character vector using the optional geoids argument containing the desired FIPS codes.
Notice that all results will include both res_geoid, res_label and
work_geoid, work_label fields, but the work fields will be
NA
for residential tables and the residence fields will be
NA
for workplace tables. Categorical fields are left as
character rather than factor datatype. Suppressed values in original
data (“A”) tables are also coded NA
.
x <- get_psrc_ctpp("tract", "A202105", 2016) # get data
mutate(x, work_label=str_sub(work_label, 7L, 14L), # abbreviate to fit in frame
category =str_sub(category, 1L, 15L)) %>%
select(5:8) %>% head()
## Key: <work_label>
## work_label category estimate estimate_moe
## <char> <char> <num> <num>
## 1: Tract 1 Bicycle NA NA
## 2: Tract 1 Bus or trolley 205 96
## 3: Tract 1 Car, truck, or 900 244
## 4: Tract 1 Car, truck, or 70 54
## 5: Tract 1 Car, truck, or NA NA
## 6: Tract 1 Car, truck, or NA NA
Aggregate CTPP data using custom variables
CTPP data tables include totals as well as category breakdowns, but
if you wish to define either a custom category or a custom geography (as
an aggregate of a smaller geography, e.g. tract or block group), you can
create your own grouping variable, and then summarize using that
variable in the psrc_ctpp_sum()
function, as follows.
psrc_ctpp_sum()
results include both
estimates and corresponding margins of error. The
incl_na=FALSE
option can be used to remove
irrelevant categories without having to create a separate filtered data
object first (NA
values for relevant categories are still
preserved).
You can utilize the convenience function ctpp_shares()
to append the share and share MOE to any CTPP dataset as long as it
contains category totals (as is typically the case).
x %<>% mutate(
custom_geo=case_when(
str_sub(work_geoid,4L,11L) %in% paste0("3302380",3:4) ~ "Downtown Bellevue",
str_sub(work_geoid,4L,11L) %in% paste0("61040",c(4,7:8),"00") ~ "Downtown Everett",
TRUE ~ NA_character_),
category=case_when(
grepl(" carpool$", category) ~"Carpool",
grepl("Drove alone", category) ~"Drove Alone",
grepl("Streetcar|Bus|Subway|Ferry|Rail", category) ~"Transit",
grepl("Bicycle|Walked", category) ~"Bike/Ped",
!is.na(category) ~category))
rs <- psrc_ctpp_sum(x, group_vars="custom_geo", incl_na=FALSE) %>% ctpp_shares()
head(rs[,2:7])
## # A tibble: 6 × 6
## custom_geo category estimate estimate_moe share share_moe
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Downtown Bellevue Bike/Ped 2300 451. 0.0643 0.0123
## 2 Downtown Bellevue Carpool 3960 NA 0.111 NA
## 3 Downtown Bellevue Drove Alone 23440 1217. 0.655 0.0207
## 4 Downtown Bellevue Motorcycle 110 61.4 0.00307 0.00171
## 5 Downtown Bellevue Other method 235 98.8 0.00657 0.00275
## 6 Downtown Bellevue Taxicab 0 NA 0 NA