2019 ACS: Subject Tables and Data Profile Tables in R using tidycensus

Hey all - I am working on project for my university. I am looking at Census Tract level data for a municipality, and leveraging 2019 ACS data in R using the tidycensus package.

Some examples of the metrics I want to analyze are Unemployment Rate, Percent White, and Poverty Rate. Because the ACS data has raw counts, I am trying to calculate the percentages myself. For example, for Unemployment Rate, I've divided B23025_005 (unemployed people) by B23025_003 (total labor force).

This is a lengthy process though, and am wondering if Subject Tables and Data Profile tables would work better? Based on their description, these tables already contain percentages. However, I am struggling to find documentation of the variable code names and what they represent. I've used this code in R, but it only shows variable definitions for Base Tables and Collapsed Tables:

load_variables(2019, "acs5", cache = TRUE)

Question: Does anyone have advice on leveraging Subject Tables and Data Profile Tables in R? Is there a good resource for variable code definitions? 

Thank you very much for your insight!! 

Parents
  • You can get both subject and data profile tables using tidycensus. If you want to explore which variables are available, you need to set the dataset argument in load_variables to profile or subject. Here for example you can see variables related to unemployment in the data profile tables:

    library(tidycensus)
    library(tidyverse)

    load_variables(year = 2019, dataset = "acs5/profile") %>%
      filter(str_detect(label, "Unemploy"))

    For subject tables set dataset to "acs5/subject".

    Hope this helps!
    Matt

  • Hi, George. Tacking this on to Matt H's reply, `tidycensus::load_variables()` will give you only the combined labels, which can be difficult to sift through to find what you're looking for. For example, S0501_C01_088 has the label "Estimate!!Total!!EARNINGS IN THE PAST 12 MONTHS (IN 2019 INFLATION-ADJUSTED DOLLARS) FOR FULL-TIME, YEAR-ROUND WORKERS!!Population 16 years and over with earnings!!Median earnings (dollars) for full-time, year-round workers:!!Female".

    I have an easier time seeing the hierarchy in the tables when I split up the label into separate parts:

    load_variables(year = 2019, dataset = "acs5/subject") %>%

    separate(label, into = paste0("label", 1:9), sep = "!!", fill = "right", remove = FALSE)

    That will break apart the label components and give you new variables: label1 would be "Estimate", label2 would be "Total", ..., and label6 would be "Female". Then it's easier to filter on different dimensions of each table.

    Something else that may be helpful: A given variable name in one year may represent something completely different in another year (unlike in the detailed tables, which are often replaced by tables with a different name when categories change substantially). And even when the variable names are the same from year to year, the format of the labels sometimes does. I do like the subject tables for getting more accurate margins of error for certain fields (rather than using the formulas for calculating them ourselves), but it can be hard to grab them over time.

    Good luck!

    --Matt

Reply
  • Hi, George. Tacking this on to Matt H's reply, `tidycensus::load_variables()` will give you only the combined labels, which can be difficult to sift through to find what you're looking for. For example, S0501_C01_088 has the label "Estimate!!Total!!EARNINGS IN THE PAST 12 MONTHS (IN 2019 INFLATION-ADJUSTED DOLLARS) FOR FULL-TIME, YEAR-ROUND WORKERS!!Population 16 years and over with earnings!!Median earnings (dollars) for full-time, year-round workers:!!Female".

    I have an easier time seeing the hierarchy in the tables when I split up the label into separate parts:

    load_variables(year = 2019, dataset = "acs5/subject") %>%

    separate(label, into = paste0("label", 1:9), sep = "!!", fill = "right", remove = FALSE)

    That will break apart the label components and give you new variables: label1 would be "Estimate", label2 would be "Total", ..., and label6 would be "Female". Then it's easier to filter on different dimensions of each table.

    Something else that may be helpful: A given variable name in one year may represent something completely different in another year (unlike in the detailed tables, which are often replaced by tables with a different name when categories change substantially). And even when the variable names are the same from year to year, the format of the labels sometimes does. I do like the subject tables for getting more accurate margins of error for certain fields (rather than using the formulas for calculating them ourselves), but it can be hard to grab them over time.

    Good luck!

    --Matt

Children
  • Thank you Matt & Matt! This is super, super helpful.

    Do you (or others) have thoughts on whether these profile or subject tables are preferable to analyze percentages and rates? Or should I build my own metrics using the Base tables? 

    I guess they ultimately should yield the same (or very similar) results.