Using ACS for Complex Data Dives


I'm trying to determine the best way to do more complex queries using ACS. It may be that my better answer is PUMS or another method, in which case, please let me know! My current director is leaving, and she's not familiar with the changes in the system. So, I reached out to Census, and they suggested asking here or trying PUMS.

I'm looking for data, especially local data (ACS will sort by place to the cities I'm looking for just fine), but I want to use more nested variables. Looking up places for one or two characteristics works fine, but I feel like I'm just taking shots in the dark at how to use the search to get, for example,

   % of total households that are female-led, foreign-born, below the poverty threshold, with children under 5, in (selected cities/places)

     (then the same for citizen vs non-citizen, and the same for national averages)

These comparisons are useful for determining need among local populations for our non-profit, but it's difficult to get comparisons this specific or relevant, but we apparently used to. I'm willing to learn PUMS, but after looking at the manual, I want to be sure there isn't a more direct route I'm missing. Thank you for reading - any advice is greatly appreciated.


  • I'm going to give my advice -- for any one who is new (or sort of new) to digging into ACS.
    First piece of advice: Orient yourself by reviewing a list of all the published summary tables in ACS (there's over 1,200 distinct tables).  Here:  Look for "Table List" link in the middle of the page.
    The portal is *not* good in providing a big picture orientation to ACS.  Instead, I recommend the "Table List." 
    Second thing to know: Most of those published tables are simple tabulations of one ACS question, or crosstabulations of two ACS questions.  There are a few crosstabs that utilize 3 ACS questions. If you're wanting to crosstab four different attributes -- as you say in your email -- sorry, you probably won't find it in the published summary tables.
    So, yes, you will have to resort to building up your own crosstabs from PUMS. There are two places to obtain the PUMS data.
    Option A: Large batch download, from Census's so-called "FTP" site  The batch download quickly gets you every variable.
    Option B: If you don't want every variable, if you want to pick and choose, then instead use IPUMS, here
    (I also recommend IPUMS to people who do not have stats or database software; their website has an online tool, allowing the visitor to build desired crosstabs thru the web interface.)
    Final point: ACS is a sample-based survey (~13% of households). There will be statistical inference error for any crosstabbed estimates you assemble.  The error and uncertainty will escalate the thinner you try to slice the data.  You gave the example of wanting to quantify households that are female-led, foreign-born, poor, and with young children. That's a tall order -- be mindful of the statistical inference error. 
    Good luck!
  • Thank you for the fast response! I may investigate IPUMS for application with our community needs assessments, as I am going to be the lone researcher for this. I really appreciate knowing the soft "three crosstable" limit, though I wish I weren't just guessing at what crosstables are available.

    I don't have separate database or statistics software, but our state happens to have a reference system which interprets ACS data tables, and I use that as often as possible.

    Thanks again for the IPUMS suggestion - learning a lot this week.

  • The problem with PUMS data (as distinct from IPUMS) is that the smallest geography available is the PUMA, which is a statistical area of about 100,000 people -- which may not be as local as you have in mind.

    Also, in terms of knowing what you can get from the detailed tables, it's always a work in progress, but we have pretty good coverage of the primary topical areas at including lists of tables which link to pages describing the specific columns available. Maybe it would be useful.

  • , you are looking for a very rare population. My quick run on the PUMS data is: nationwide, your population is 0.33% of households. Not exactly miniscule (about 400K households nationwide), but hard to locate.Within PUMS, as mentioned, the smallest geography is the public use microdata area with the population size of 100K (give or take) and the sample size 1000 to 2000 (give or take), or 300 to 800 households; meaning, you will have 0 to 4 household of your interest per PUMA, with standard errors that are just correspondingly awful. You'd be able to condense your data a bit for large cities (e.g. Chicago is like 17 PUMAs), and get better sample sizes per city, but you probably already know that your population will likely be found in those large cities. You won't be able to find "surprises" in that your target population seems to have a hot spot in Boise, ID, or Rochester, NY, because cities of that size are not large enough to support reliable inference.

  • Stas - thanks for your response! The population for my example is very specific, but considering your point that it's heavily concentrated in urban centers (and the sample is ~13%), it has a lot of meaning to community service organizations and health centers in those areas of concentration, where that 0.33% national average will not be relevant.

    Because underreporting can also lead to underrepresentation, we sometimes also struggle to really identify what "best data available" means before using it as a starting point. To that end, I really appreciate the definitions from you and Joe regarding the PUMA as the geographic unit, because that will help me narrow my research on topics similar to the sample (i.e., I will only look at urban centers at least close to that 100,000 number, if I want to find it in the ACS).

    Thanks again.

  • Given where this conversation has led, I'd recommend again that you try using IPUMS USA for this inquiry, especially because IPUMS has already determined which cities can be identified from PUMAs, within a 10% population match threshold. See the documentation for the IPUMS CITY variable, which includes this listing of identifiable cities in each ACS 1-year sample. On that page, if you select the "Case-count view" option, you'll be able see the exact sample size for each city in each 1-year sample. (Note: The 5-year sample sizes are simply the sum of sample sizes for each individual 1-year sample in that range.)

    To obtain larger sample sizes, you might pool even more years... say, all 8 years in the 2012-2019 range, which would give a sample size about 8 times as large as any single year. That would obviously come with a significant loss in temporal precision, and it still may not give you samples large enough to make strong inferences about very small population groups in specific cities.

  • I just wanted to follow up with this in case anyone else is looking for more local data with complex dives - as the first post here by Todd explains, you really want to know what the table keywords are, and you can get a working knowledge of that by trying a few different searches and checking out the suggested searches on your left tab when you first start learning.

    After that, you can easily search for, for example, (insurance status by ethnicity [town][state]).  The third suggested result on the left for my test search reads "selected characteristics..." and is specific to my test location. You can click on the left menu, on "map," to significantly narrow your geographical results, even down to Census Tract (or many other options) in the maps tab under "geographies."

    Thanks again all.