Skip to contents

Why CTPP data?

Although it is derived from the American Community Survey (ACS), CTPP includes data items that are not published in the ACS release, and at scales substantially smaller than ACS Public Use Microdata (PUMS). As with other ACS products, it is consistent across scales, supports longitudinal analysis, and includes margins of error. It is the most authoritative nationwide, Census Bureau source for transportation flow data in particular.

The primary drawback to CTPP data is the approximately 3-year lag to develop it; since it’s restricted to non-overlapping 5-year spans, it may involve a remove of nearly seven years from the conclusion of the first year in a reported span. Issues such as the transportation impacts accompanying the 2020 Covid pandemic have a long signature in the data (i.e. the next data without it would arrive around 2029).

Identify and retrieve CTPP data

Determine which table to request

In order to request data, as with the ACS, one must know the CTPP table code (or variable codes). To assist in identifying this, the package has a search function, ctpp_tblsearch(). It has two required parameters:

  • prefix - CTPP uses a pattern in table names: the first character denotes whether a table is originally reported data (“A”; confidential data is suppressed) or perturbed (“B”; some noise infusion instead of suppression), and the second indicates either residence geography (“1”), workplace geography (“2”), or flows (“3”, i.e. both residence and workplace geography). For example, a table for original workplace geography uses the prefix “A2”. The function argument can also use a question mark in place of these characters, if you’re not particular about one or the other (or either), i.e. “?3” would show all relevant flow tables, regardless of whether they are original or perturbed data.

  • “regex” - This is the regular expression search term to find in the table description. It is not case-sensitive.

  • (optional) year - i.e. specify which survey; otherwise defaults to latest (currently 2012-16)

In our example, we’re looking for a workplace-geography table showing mode, aka “means of transportation” in Census parlance.

shhh <- suppressPackageStartupMessages
shhh(library(psrcctpp))
shhh(library(magrittr))
shhh(library(dplyr))
shhh(library(stringr))

ctpp_tblsearch("?2", "means of transp.") %>%
    mutate(desc=str_sub(description, 1L, 50L)) %>%  # abbreviate to fit in frame
    select(name, desc) %>% head()
##       name                                               desc
##     <char>                                             <char>
## 1: A202105                 Workers by means of transportation
## 2: B206200   Aggregate travel time by means of transportation
## 3: B203208 Median household income by means of transportation
## 4: B203207 Workers by household size and by means of transpor
## 5: B206202        Mean travel time by means of transportation
## 6: B202200 Workers by minority status and by means of transpo

In our example, the first table (A202105) is the one we’re after. Notice, even if your attribute of interest is listed secondarily, there will typically be subtotals by that dimension alone within the table.

Get the data

Once we’ve identified the table code, we can now use the get_psrc_ctpp() function to retrieve the data. Its parameters are:

  • scale - for residence or workplace tables, either “county”, “place”, or “tract”
    for flow tables, either “county-county”, “place-place”, “place-county”, “county-place”, or “tract-tract”
    “block group” scale will be available starting with the 2017-21 survey.
  • table_code - identifier (string) for the table you want (you can also request individual variables)
  • dyear - aka data year, the last year of the CTPP span (i.e. 2016 for 2012-16)

By default, the resulting dataframe is restricted to geographies within the Central Puget Sound region (or flows with one end within the region). To specify a narrower set–or geographies outside the region–you can provide a character vector using the optional geoids argument containing the desired FIPS codes.

Notice that all results will include both res_geoid, res_label and work_geoid, work_label fields, but the work fields will be NA for residential tables and the residence fields will be NA for workplace tables. Categorical fields are left as character rather than factor datatype. Suppressed values in original data (“A”) tables are also coded NA.

x <- get_psrc_ctpp("tract", "A202105", 2016)          # get data

mutate(x, work_label=str_sub(work_label, 7L, 14L),    # abbreviate to fit in frame
          category  =str_sub(category, 1L, 15L)) %>%
  select(5:8) %>% head()                   
## Key: <work_label>
##    work_label        category estimate estimate_moe
##        <char>          <char>    <num>        <num>
## 1:    Tract 1         Bicycle       NA           NA
## 2:    Tract 1 Bus or trolley       205           96
## 3:    Tract 1 Car, truck, or       900          244
## 4:    Tract 1 Car, truck, or        70           54
## 5:    Tract 1 Car, truck, or        NA           NA
## 6:    Tract 1 Car, truck, or        NA           NA

Aggregate CTPP data using custom variables

CTPP data tables include totals as well as category breakdowns, but if you wish to define either a custom category or a custom geography (as an aggregate of a smaller geography, e.g. tract or block group), you can create your own grouping variable, and then summarize using that variable in the psrc_ctpp_sum() function, as follows.

psrc_ctpp_sum() results include both estimates and corresponding margins of error. The incl_na=FALSE option can be used to remove irrelevant categories without having to create a separate filtered data object first (NA values for relevant categories are still preserved).

You can utilize the convenience function ctpp_shares() to append the share and share MOE to any CTPP dataset as long as it contains category totals (as is typically the case).

x %<>% mutate(
  custom_geo=case_when(
     str_sub(work_geoid,4L,11L) %in% paste0("3302380",3:4)         ~ "Downtown Bellevue",
     str_sub(work_geoid,4L,11L) %in% paste0("61040",c(4,7:8),"00") ~ "Downtown Everett",
     TRUE                                                          ~ NA_character_),
  category=case_when(
              grepl(" carpool$",                       category) ~"Carpool",
              grepl("Drove alone",                     category) ~"Drove Alone",
              grepl("Streetcar|Bus|Subway|Ferry|Rail", category) ~"Transit",
              grepl("Bicycle|Walked",                  category) ~"Bike/Ped",
              !is.na(category) ~category))
rs <- psrc_ctpp_sum(x, group_vars="custom_geo", incl_na=FALSE) %>% ctpp_shares()
head(rs[,2:7])
## # A tibble: 6 × 6
##   custom_geo        category     estimate estimate_moe   share share_moe
##   <chr>             <chr>           <dbl>        <dbl>   <dbl>     <dbl>
## 1 Downtown Bellevue Bike/Ped         2300        451.  0.0643    0.0123 
## 2 Downtown Bellevue Carpool          3960         NA   0.111    NA      
## 3 Downtown Bellevue Drove Alone     23440       1217.  0.655     0.0207 
## 4 Downtown Bellevue Motorcycle        110         61.4 0.00307   0.00171
## 5 Downtown Bellevue Other method      235         98.8 0.00657   0.00275
## 6 Downtown Bellevue Taxicab             0         NA   0        NA