Finding ACS variable names for tidycensus in R

Hello!

I am working on a research project that involves collecting 28 variables from ACS data. I have been working with tidycensus and have figured out my R code to get me what I want in Rstudio, but I cannot find an extensive list of variable names to input in the get_acs(....variables=XYZ) function. When I use the load_variables function, I feel like I am either not getting the full list or the variable is too granular. For example, I need to know what percentage of people have a bachelor's degree or higher at the census tract level, but the ACS5 variables break it down too far (age, gender, etc.). I found a couple for my table, but not for all of the ones I need. I have looked to see if the online ACS data tables had it listed next to the variable description and couldn't find it.

My ask is: does anyone know the ACS variable names for these? and/or where I can find them on the census website? I'm looking for 2021 ACS 5-year estimate data. 

Indicator_Calculation ACS_Name
Percent labor force employed DP03_0004E
Percent employed Armed Forces  DP03_0006E
GINI coefficient of income equality B19083_001E
Population who are foreign-born and moved to US in previous 5 years per 1000 population
Percent population born in a state that still resides in that state
Percent voter participation in a Presidential election (Precinct level data)
Percent labor force employed
Percent population not employed in farming, fishing, forestry, extractive, and tourism industries
Average sales volume divided by number of businesses 
Percent employed Armed Forces 
Percent housing units that are not mobile homes
Vacant rental units per 1000 population
Hospital beds per 1000 population
Percent housing units built before 1970 or after 2000 
Number of hotels/motels per 1000 population
Number of public schools per 1000 population
Number of actual connections per 1000 households 
Percent population with broadband internet subscription 
Absolute value of population change base year/current year
Percent population with college education or more
Percent population between 18 to 65 years old 
Percent households with at least one vehicle
Percent households with telephone service available
Percent population that are proficient English speakers
Percent population without a sensory, physical, or mental disability
Percent of the population (under 65) with health insurance coverage
GINI coefficient of income equality
Percent of absolute difference between male median annual earnings and female median annual earnings divided by annual income 

Thanks very much!

-A stressed PhD student

  • Very good question. Field desc are often terrible and you would need the universe. I don't have the answer but have wanted to do a mapping with all the field names. There are pieces of the puzzle here and there but no comprehensive data dictionary.  I still load a desc file for every single table I want to pull. I pull some of that data (although at the county level, also no R just API). for example; https://api.census.gov/data/2021/acs/acs5?get=B29002_001E,B29002_002E,B29002_003E,B29002_004E,B29002_005E,B29002_006E,B29002_007E,B29002_008E,NAME&for=county:* 

    pulls all the ed levels for all counties. I am not an expert here, but I am fairly sure that asking for foreign born and moved at the tract level will give you null values or -99999999 or -666666666 etc. as diff tables respond differently (R may do this much better).  So, I think that understanding what data is available before you tie yourself to a specific variable is critical. I think I may have around half of your list with table and field name if that will help. I usually spend time on this over the weekend and I have a new grandchild keeping me busy.  Again not an expert but I could give you around 100 ACS field names for everything I pull, might give you around half of what you need.

  • I generally consult the Table Shells here: https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.2022.html
    A few caveats:
    1. These only contain "Detailed Tables" (those starting with a B or C), which are generally only counts, with a few averages and medians. It doesn't contain percentages, so you'd have to calculate them yourself. However, the hierarchical nature of the document should make making those calculations simple.

    2. It looks like Census changed the format of this table to be a pipe-delimited text document, making it a little harder to work with. I'm amused by the "Indent" column, which is in some ways easier to work with than the actual indentations of previous versions (this is for the aforementioned hierarchy information), but might be a little confusing for a newer user. This is all to say, you might use the 2021 Excel formatted version instead of the 2022 version.

  • In R, you can get the variable listings through {censusapi} where you can access any of the API endpoints (rather than those pre-coded in {tidycensus} for ACS), and filtering the results with your keywords.

  • Dear Stas,

    I have some R code that uses censusapi functions to download a table "group." from that the code produces an r x 2 matrix (columns Est and MoE) and then I attach the row labels from the metadata (also downloaded using censusapi). This seems to be closer to sliced bread than what tidycensus does.

    Dave

  • Always lots of great info in the forums. As there are many questions around field names desc etc. and R vs API, is it possible for someone here (Mark, Bernie, David?) to produce a simple doc, that highlights these sources and related issues, and then everyone can benefit and use it as a resouce? Put it in the ACS Resources tab.

  • The issue with having too "granular" a breakdown with the rows in the table (if you are using the API) you are stuck with it.  For example table B01001 is broken down by age and sex. The "primary" variable is sex. There are male and female totals.  But age is the "secondary" variable so there are no rows that give the total for say "Under 5 years."  You have to add the Male : Under 5 years and Female : Under 5 years rows.

    I have a nice R function that downloads the table "group" and attaches the labels from the metadata. I then have a function that takes linear combinations of rows in a table (columns Est and MoE) The R function is based on the "censusapi" R package

    (State of MA FIPS 25)

    Est MoE
    Male : 30 to 34 years 245596 318
    Male : 25 to 29 years 250358 394
    Male : 40 to 44 years 208058 2998
    Male : 35 to 39 years 224342 3052
    Male : 50 to 54 years 235548 361
    Male : 45 to 49 years 215042 441
    Male : 62 to 64 years 130319 2394
    Male : 55 to 59 years 245573 2551
    Male : 60 and 61 years 93025 1827
    Male : 65 and 66 years 76963 1670
    Male : 70 to 74 years 138031 2202
    Male : 67 to 69 years 103679 2068
    Male : 80 to 84 years 52395 1488
    Male : 75 to 79 years 82445 1474
    Female 3578678 606
    Male : 85 years and over 54306 1625
    Female : 5 to 9 years 182121 2636
    Female : 10 to 14 years 197419 2634
    Female : Under 5 years 174546 323
    Female : 15 to 17 years 124120 298
    Female : 20 years 50778 1502
    Female : 18 and 19 years 108312 396
    Female : 22 to 24 years 143258 2017
    Female : 21 years 50936 1754
    Female : 30 to 34 years 246823 272
    Female : 25 to 29 years 247846 458
    Female : 40 to 44 years 212330 2952
    Female : 35 to 39 years 228666 2951
    Female : 45 to 49 years 223825 285
    Female : 50 to 54 years 245015 346
    Female : 60 and 61 years 96092 2253
    Female : 55 to 59 years 254311 3020
    Female : 65 and 66 years 84787 1807
    Female : 62 to 64 years 144068 2620
    Female : 70 to 74 years 163566 2257
    Female : 67 to 69 years 115894 2341
    Female : 80 to 84 years 72738 2001
    Female : 75 to 79 years 106706 2139
    Female : 85 years and over 104521 2267
    Male 3413174 606
    Total 6991852 0
    Male : 5 to 9 years 189716 2794
    Male : Under 5 years 183397 357
    Male : 15 to 17 years 129924 299
    Male : 18 and 19 years 104293 383
    Male : 10 to 14 years 208481 2761
    Male : 21 years 50414 1662
    Male : 20 years 49917 1494
    Male : 22 to 24 years 141352 2048
    >

    From these building blocks you can easily write an R function to sum the Male and Female blocks to get the age categories.

  • Since you're using tidycensus you must be using R. If you're also using RStudio you can do something like:

    ACSlist <- load_variables(2021, "acs5")

    View(ACSlist)

    which will display ALL the variables in API the very nice interactive DT powered table viewer built into RStudio. Can also search using dplyr etc as well. It's along list, but it's both sortable and searchable.