Hello!
I am working on a research project that involves collecting 28 variables from ACS data. I have been working with tidycensus and have figured out my R code to get me what I want in Rstudio, but I cannot find an extensive list of variable names to input in the get_acs(....variables=XYZ) function. When I use the load_variables function, I feel like I am either not getting the full list or the variable is too granular. For example, I need to know what percentage of people have a bachelor's degree or higher at the census tract level, but the ACS5 variables break it down too far (age, gender, etc.). I found a couple for my table, but not for all of the ones I need. I have looked to see if the online ACS data tables had it listed next to the variable description and couldn't find it.
My ask is: does anyone know the ACS variable names for these? and/or where I can find them on the census website? I'm looking for 2021 ACS 5-year estimate data.
Thanks very much!
-A stressed PhD student
Very good question. Field desc are often terrible and you would need the universe. I don't have the answer but have wanted to do a mapping with all the field names. There are pieces of the puzzle here and there but no comprehensive data dictionary. I still load a desc file for every single table I want to pull. I pull some of that data (although at the county level, also no R just API). for example; https://api.census.gov/data/2021/acs/acs5?get=B29002_001E,B29002_002E,B29002_003E,B29002_004E,B29002_005E,B29002_006E,B29002_007E,B29002_008E,NAME&for=county:*
pulls all the ed levels for all counties. I am not an expert here, but I am fairly sure that asking for foreign born and moved at the tract level will give you null values or -99999999 or -666666666 etc. as diff tables respond differently (R may do this much better). So, I think that understanding what data is available before you tie yourself to a specific variable is critical. I think I may have around half of your list with table and field name if that will help. I usually spend time on this over the weekend and I have a new grandchild keeping me busy. Again not an expert but I could give you around 100 ACS field names for everything I pull, might give you around half of what you need.
I generally consult the Table Shells here: https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.2022.htmlA few caveats:1. These only contain "Detailed Tables" (those starting with a B or C), which are generally only counts, with a few averages and medians. It doesn't contain percentages, so you'd have to calculate them yourself. However, the hierarchical nature of the document should make making those calculations simple.
2. It looks like Census changed the format of this table to be a pipe-delimited text document, making it a little harder to work with. I'm amused by the "Indent" column, which is in some ways easier to work with than the actual indentations of previous versions (this is for the aforementioned hierarchy information), but might be a little confusing for a newer user. This is all to say, you might use the 2021 Excel formatted version instead of the 2022 version.
In R, you can get the variable listings through {censusapi} where you can access any of the API endpoints (rather than those pre-coded in {tidycensus} for ACS), and filtering the results with your keywords.
Dear Stas,
I have some R code that uses censusapi functions to download a table "group." from that the code produces an r x 2 matrix (columns Est and MoE) and then I attach the row labels from the metadata (also downloaded using censusapi). This seems to be closer to sliced bread than what tidycensus does.
Dave
Always lots of great info in the forums. As there are many questions around field names desc etc. and R vs API, is it possible for someone here (Mark, Bernie, David?) to produce a simple doc, that highlights these sources and related issues, and then everyone can benefit and use it as a resouce? Put it in the ACS Resources tab.
The issue with having too "granular" a breakdown with the rows in the table (if you are using the API) you are stuck with it. For example table B01001 is broken down by age and sex. The "primary" variable is sex. There are male and female totals. But age is the "secondary" variable so there are no rows that give the total for say "Under 5 years." You have to add the Male : Under 5 years and Female : Under 5 years rows.
I have a nice R function that downloads the table "group" and attaches the labels from the metadata. I then have a function that takes linear combinations of rows in a table (columns Est and MoE) The R function is based on the "censusapi" R package
(State of MA FIPS 25)
Est MoEMale : 30 to 34 years 245596 318Male : 25 to 29 years 250358 394Male : 40 to 44 years 208058 2998Male : 35 to 39 years 224342 3052Male : 50 to 54 years 235548 361Male : 45 to 49 years 215042 441Male : 62 to 64 years 130319 2394Male : 55 to 59 years 245573 2551Male : 60 and 61 years 93025 1827Male : 65 and 66 years 76963 1670Male : 70 to 74 years 138031 2202Male : 67 to 69 years 103679 2068Male : 80 to 84 years 52395 1488Male : 75 to 79 years 82445 1474Female 3578678 606Male : 85 years and over 54306 1625Female : 5 to 9 years 182121 2636Female : 10 to 14 years 197419 2634Female : Under 5 years 174546 323Female : 15 to 17 years 124120 298Female : 20 years 50778 1502Female : 18 and 19 years 108312 396Female : 22 to 24 years 143258 2017Female : 21 years 50936 1754Female : 30 to 34 years 246823 272Female : 25 to 29 years 247846 458Female : 40 to 44 years 212330 2952Female : 35 to 39 years 228666 2951Female : 45 to 49 years 223825 285Female : 50 to 54 years 245015 346Female : 60 and 61 years 96092 2253Female : 55 to 59 years 254311 3020Female : 65 and 66 years 84787 1807Female : 62 to 64 years 144068 2620Female : 70 to 74 years 163566 2257Female : 67 to 69 years 115894 2341Female : 80 to 84 years 72738 2001Female : 75 to 79 years 106706 2139Female : 85 years and over 104521 2267Male 3413174 606Total 6991852 0Male : 5 to 9 years 189716 2794Male : Under 5 years 183397 357Male : 15 to 17 years 129924 299Male : 18 and 19 years 104293 383Male : 10 to 14 years 208481 2761Male : 21 years 50414 1662Male : 20 years 49917 1494Male : 22 to 24 years 141352 2048>
From these building blocks you can easily write an R function to sum the Male and Female blocks to get the age categories.
Since you're using tidycensus you must be using R. If you're also using RStudio you can do something like:
ACSlist <- load_variables(2021, "acs5")
View(ACSlist)
which will display ALL the variables in API the very nice interactive DT powered table viewer built into RStudio. Can also search using dplyr etc as well. It's along list, but it's both sortable and searchable.