Census estimates for non-census geographies
geographic-conversions.Rmd
Every Census estimate is linked to a geography. However, not every geography is linked to a Census estimate.
Since the Census Bureau cannot publish data at every conceivable scale, data users often approximate a custom boundary by choosing a set of disaggregate Census geographies (e.g. block, block group, or tract) that closely align with it. This approach is very often adequate to the purpose.
Geographic conversion via a granular proxy variable
For more geographic precision, we in effect disaggregate the estimate to a granular scale–specifically, parcel–by assuming the estimate is in direct proportion either to a variable reported at that scale (housing units), or to one of several population measures we can apportion to that scale (e.g. OFM total population).
With this in mind, PSRC has developed demographic ratios to convert estimates from Census geographies to its own planning geographies. Each planning geography has its own function:
- Regional Growth Strategy classes -
census_to_rgs()
- Regional Growth Centers -
census_to_rgc()
- Manufacturing-Industrial Centers -
census_to_mic()
- Traffic Analysis Zones -
census_to_taz()
- High Capacity Transit Station areas -
census_to_hct()
- 2020 Jurisdictional boundaries -
census_to_juris()
As long as you’re using the default proxy variable (total
population), you only need a single argument: the name of the dataframe
(as returned by get_acs_recs()
or
get_decennial_recs()
) you wish to convert.
library(psrccensus)
#> Have you updated tidycensus since the release date of the data you'll use?
#> devtools::install_github('walkerke/tidycensus')
library(magrittr)
x <- get_acs_recs(geography = 'block group',
table.names = 'B03002',
years = 2021,
acs.type = 'acs5') %>%
dplyr::mutate(label=stringr::str_replace_all(label,"(^Estimate!!|Total:!!)",""))
x %>% dplyr::filter(variable=="B03002_012") %>% .[,c(1,5:8)] %>% head()
#> # A tibble: 6 × 5
#> GEOID variable estimate moe label
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 530330001011 B03002_012 28 33 Hispanic or Latino:
#> 2 530330001012 B03002_012 188 176 Hispanic or Latino:
#> 3 530330001013 B03002_012 48 50 Hispanic or Latino:
#> 4 530330001021 B03002_012 0 13 Hispanic or Latino:
#> 5 530330001022 B03002_012 181 206 Hispanic or Latino:
#> 6 530330001023 B03002_012 110 151 Hispanic or Latino:
rgc_race <- census_to_rgc(x)
rgc_race %>% dplyr::filter(variable=="B03002_012") %>% .[,c(1,3,7:8)] %>% head()
#> planning_geog variable year estimate
#> <char> <char> <num> <num>
#> 1: not in regional growth center B03002_012 2021 415884
#> 2: Seattle Northgate B03002_012 2021 719
#> 3: Seattle University Community B03002_012 2021 1921
#> 4: Seattle Uptown B03002_012 2021 649
#> 5: Seattle First Hill/Capitol Hill B03002_012 2021 3318
#> 6: Seattle South Lake Union B03002_012 2021 380
Selecting the correct proxy
The key assumption behind this method is the direct relationship between the granular proxy variable and the Census variable of interest: the stronger this relationship, the more defensible the result. Before splitting a geography, consider which of these metrics is most relevant to your Census estimate of interest:
-
total_pop
–i.e. total population (the default) household_pop
group_quarters_pop
housing_units
occupied_housing_units
You can specify these using the wgt
argument:
y <- get_acs_recs(geography = 'tract',
table.names = 'B26001',
years = 2019,
acs.type = 'acs5') %>%
dplyr::mutate(label=stringr::str_replace_all(label,"(^Estimate!!|Total:!!)",""))
y %>% .[,c(1,5:8)] %>% head()
#> # A tibble: 6 × 5
#> GEOID estimate moe label concept
#> <chr> <dbl> <dbl> <chr> <chr>
#> 1 53033000100 38 24 Total: GROUP QUARTERS POPULATION
#> 2 53033000200 87 150 Total: GROUP QUARTERS POPULATION
#> 3 53033000300 16 5 Total: GROUP QUARTERS POPULATION
#> 4 53033000401 89 32 Total: GROUP QUARTERS POPULATION
#> 5 53033000402 239 151 Total: GROUP QUARTERS POPULATION
#> 6 53033000500 21 20 Total: GROUP QUARTERS POPULATION
rgc_gq_age <- census_to_rgs(y, wgt="group_quarters_pop")
rgc_gq_age %>% .[,c(1,3,4,7,8)]
#> planning_geog variable label year estimate
#> <char> <char> <char> <num> <num>
#> 1: Rural B26001_001 Total: 2019 4848
#> 2: Metro B26001_001 Total: 2019 38944
#> 3: HCT B26001_001 Total: 2019 8117
#> 4: Core B26001_001 Total: 2019 11580
#> 5: UU B26001_001 Total: 2019 7224
#> 6: CitiesTowns B26001_001 Total: 2019 4945
If your dataframe combines tables that would most appropriately apply two different proxy metrics–for example, one linked to total population, and another linked to housing units–you might divide the dataframe (or alter your code to retrieve separate tables originally) before applying the geographic conversion.
Further geographies
The set of planning geographies is not strictly limited to the five listed above; with approval, the set can be expanded (see the documentation - PSRC VPN required to view).
To convert census estimates to custom geographies that don’t have
stored splits, use the census_to_customgeo()
function. This
performs an analogous conversion using the same parcel-level proxy
variables, but requires two additional arguments: an sf
package object (i.e. your custom geography/geometry), and the variable
name in that file that labels the geography. This function is much
slower than the PSRC planning geography functions listed above because
it must get parcel-level data from Elmer and create the custom splits
rather than just read them. Depending on your download speed and the
complexity of the geography, it’ll take a minute or two to run (still,
not shabby for this level of detail).
Error margins & limitations
Currently, split-derived calculations are limited to estimates only; medians cannot be determined via this method. Analysts can calculate shares by converting estimates of both the numerator and denominator to the appropriate geography.
Although it may be harder to assume Margins of Error (MOE) are directly proportional to the split weight, the assumption is applied identically to both estimate and MOE, which seemed the best choice of the available options.