Skip to contents

Every Census estimate is linked to a geography. However, not every geography is linked to a Census estimate.

Since the Census Bureau cannot publish data at every conceivable scale, data users often approximate a custom boundary by choosing a set of disaggregate Census geographies (e.g. block, block group, or tract) that closely align with it. This approach is very often adequate to the purpose.

Geographic conversion via a granular proxy variable

For more geographic precision, we in effect disaggregate the estimate to a granular scale–specifically, parcel–by assuming the estimate is in direct proportion either to a variable reported at that scale (housing units), or to one of several population measures we can apportion to that scale (e.g. OFM total population).

With this in mind, PSRC has developed demographic ratios to convert estimates from Census geographies to its own planning geographies. Each planning geography has its own function:

As long as you’re using the default proxy variable (total population), you only need a single argument: the name of the dataframe (as returned by get_acs_recs() or get_decennial_recs()) you wish to convert.

library(psrccensus)
#> Have you updated tidycensus since the release date of the data you'll use?
#> devtools::install_github('walkerke/tidycensus')
library(magrittr)

x <- get_acs_recs(geography = 'block group',
                  table.names = 'B03002',
                  years = 2021,
                  acs.type = 'acs5') %>% 
  dplyr::mutate(label=stringr::str_replace_all(label,"(^Estimate!!|Total:!!)",""))

x %>% dplyr::filter(variable=="B03002_012") %>% .[,c(1,5:8)] %>% head()
#> # A tibble: 6 × 5
#>   GEOID        variable   estimate   moe label              
#>   <chr>        <chr>         <dbl> <dbl> <chr>              
#> 1 530330001011 B03002_012       28    33 Hispanic or Latino:
#> 2 530330001012 B03002_012      188   176 Hispanic or Latino:
#> 3 530330001013 B03002_012       48    50 Hispanic or Latino:
#> 4 530330001021 B03002_012        0    13 Hispanic or Latino:
#> 5 530330001022 B03002_012      181   206 Hispanic or Latino:
#> 6 530330001023 B03002_012      110   151 Hispanic or Latino:

rgc_race <- census_to_rgc(x)

rgc_race %>% dplyr::filter(variable=="B03002_012") %>% .[,c(1,3,7:8)] %>% head()
#>                      planning_geog   variable  year estimate
#>                             <char>     <char> <num>    <num>
#> 1:   not in regional growth center B03002_012  2021   415884
#> 2:               Seattle Northgate B03002_012  2021      719
#> 3:    Seattle University Community B03002_012  2021     1921
#> 4:                  Seattle Uptown B03002_012  2021      649
#> 5: Seattle First Hill/Capitol Hill B03002_012  2021     3318
#> 6:        Seattle South Lake Union B03002_012  2021      380

Selecting the correct proxy

The key assumption behind this method is the direct relationship between the granular proxy variable and the Census variable of interest: the stronger this relationship, the more defensible the result. Before splitting a geography, consider which of these metrics is most relevant to your Census estimate of interest:

  • total_pop –i.e. total population (the default)
  • household_pop
  • group_quarters_pop
  • housing_units
  • occupied_housing_units

You can specify these using the wgt argument:


y <- get_acs_recs(geography = 'tract',
                  table.names = 'B26001',
                  years = 2019,
                  acs.type = 'acs5') %>% 
  dplyr::mutate(label=stringr::str_replace_all(label,"(^Estimate!!|Total:!!)",""))

y %>% .[,c(1,5:8)] %>% head()
#> # A tibble: 6 × 5
#>   GEOID       estimate   moe label  concept                  
#>   <chr>          <dbl> <dbl> <chr>  <chr>                    
#> 1 53033000100       38    24 Total: GROUP QUARTERS POPULATION
#> 2 53033000200       87   150 Total: GROUP QUARTERS POPULATION
#> 3 53033000300       16     5 Total: GROUP QUARTERS POPULATION
#> 4 53033000401       89    32 Total: GROUP QUARTERS POPULATION
#> 5 53033000402      239   151 Total: GROUP QUARTERS POPULATION
#> 6 53033000500       21    20 Total: GROUP QUARTERS POPULATION

rgc_gq_age <- census_to_rgs(y, wgt="group_quarters_pop")

rgc_gq_age %>% .[,c(1,3,4,7,8)]
#>    planning_geog   variable  label  year estimate
#>           <char>     <char> <char> <num>    <num>
#> 1:         Rural B26001_001 Total:  2019     4848
#> 2:         Metro B26001_001 Total:  2019    38944
#> 3:           HCT B26001_001 Total:  2019     8117
#> 4:          Core B26001_001 Total:  2019    11580
#> 5:            UU B26001_001 Total:  2019     7224
#> 6:   CitiesTowns B26001_001 Total:  2019     4945

If your dataframe combines tables that would most appropriately apply two different proxy metrics–for example, one linked to total population, and another linked to housing units–you might divide the dataframe (or alter your code to retrieve separate tables originally) before applying the geographic conversion.

Further geographies

The set of planning geographies is not strictly limited to the five listed above; with approval, the set can be expanded (see the documentation - PSRC VPN required to view).

To convert census estimates to custom geographies that don’t have stored splits, use the census_to_customgeo() function. This performs an analogous conversion using the same parcel-level proxy variables, but requires two additional arguments: an sf package object (i.e. your custom geography/geometry), and the variable name in that file that labels the geography. This function is much slower than the PSRC planning geography functions listed above because it must get parcel-level data from Elmer and create the custom splits rather than just read them. Depending on your download speed and the complexity of the geography, it’ll take a minute or two to run (still, not shabby for this level of detail).

Error margins & limitations

Currently, split-derived calculations are limited to estimates only; medians cannot be determined via this method. Analysts can calculate shares by converting estimates of both the numerator and denominator to the appropriate geography.

Although it may be harder to assume Margins of Error (MOE) are directly proportional to the split weight, the assumption is applied identically to both estimate and MOE, which seemed the best choice of the available options.