Educational attainment by race/ethnicity and county - can't get totals and subtotals to match

Good afternoon,

I am trying to use the latest ACS 5 Year data to document race and gender by educational attainment at the county level for the service region of our community college.   I used to use Factfinder for this, but have a question with the results I am getting from and would like to get some help.  I’ve read through the related user guides for researchers, looked through the FAQs, and done a few searches of the forum.  I’ve read through the product list and have used certain tables a lot in the past.   I'm sorry if this has been covered before; if so please just point me to the threads.  

I am able to select the geographic counties of interest without problem.  I have determined which tables contain educational attainment data.  Since I need to have all of my counties of interest, I am using the 5 year data so all counties are represented in the sample.  I know that using just 2018 does not provide me with all of the counties I need. 

When I pick table B15003, I am able to select data on attainment level and get totals for US and counties.

When I pick table B15002, I am able to select data on attainment level by gender and get totals for US and counties.

Fortunately, the US and county totals for these two tables for the ACS 5 Yr match.

The problem I have is that I want breakouts of attainment level by race, for which I am using B15002.  I know there are separate tables for each race/ethnicity.  I am aware of the two-question format for Hispanic and then for race and how these are combined.   I am able to access county level data for white (C15002A), Black (C15002B), American Indian (C15002C), Asian (C15002D), Native Hawaiian PI (C15002E), Other race (C15002F), 2 or more (C15002G), White non-Hispanic (C15002H), and Hispanic/Latino (C15002I).

My assumption is that I can only use one of these white tables, probably white non-Hispanic, since I want to also use Hispanic.

The problem is that the sum of these tables does not equal what I get by attainment level and gender with B15003 and B15002. 

I thought I would be able to do this in this way, since I am always using 5 year data for ACS18, the same counties, and the same (collapsed in some cases) attainment levels.  But even without looking at attainment, the grand totals are different.   The total for B15002 and B15003 is the same.  But when I use excel to total all of the race/ethnicity breakouts (except white but including white non-Hispanic), I get a higher number than in B15003 and B15002.

I am used to using US Dept of Ed NCES IPEDS data on attainment by race/ethnicity, which incorporates a single set of labels for race/ethnicity that includes non-Hispanic versions of each race code.  I do not see those provided for these census tables, except for white non-Hispanic. 

So I am not sure how to proceed.  Any and all help with this is much appreciated.  I had it worked out I thought in Factfinder, but realize it was the same table structures.  So I am not sure where the problem lies.    I could have posted this to the user community, but thought my question must be something everyone else knows already.

Thank you very much for your assistance,


John Milam, Ph.D.
Professor and Director of Institutional Effectiveness
Coordinator, LFCC IRB
Lord Fairfax Community College
173 Skirmisher Lane
Middletown, VA 22645-1745
(540) 868-7249

  • You might have to go to microdata for that -- attainment for non-Hispanic components of races other than white -- , but unless you need to match NCES I think it makes more sense to include Hispanic portions of all races (other than white, since non-Hispanic white is the definition of the non-minority-population) --- some people really are both Black and Hispanic, Asian and Hispanic etc -- that doesn't make them less Black or Asian. Your numbers won't add up to 100% but that's not a reason to take out part of a race of people. IMHO. 

  • Sorry. the new features on this forum don't let me just edit without a whole new notice being sent out via emails.

    What I had meant to say is basically what Tim said.. only he responded much better than I:

    Could it be that you are not taking into account that Hispanic can be of ANY RACE and not just White?  You have white, non Hispanic, but you do not have values for the non-Hispanic version of the other races. 

  • Sorry. the new features on this forum don't let me just edit without a whole new notice being sent out via emails.

    What I had meant to say is basically what Tim said.. only he responded much better than I:

    Could it be that you are not taking into account that Hispanic can be of ANY RACE and not just White?  You have white, non Hispanic, but you do not have values for the non-Hispanic version of the other races. 

  • Thank you both for taking time to help me with this.  As you note, there is a variable for Total White alone, not Hispanic or Latino (S1501_C01_031E), but this does not exist for the others (such as Total American Indian or Alaska Native alone).  

    I am using these data for hiring with percentages compared between the pool (IPEDS Completions for faculty or ACS5Yr for staff), so I need a total.  If I include Hispanic as a category, as done with IPEDS, then the total of the race/ethnicity categories is higher than the total estimate.  

    Since IPEDS does not present the data in the two-question format, I was hoping to use the same values in ACS.  While I can get White NH, I can't get it for the others without microdata.   I am reluctant to do that because I want community colleges across the system to be able to use this approach to help ensure better equity in hiring.  I also want to update the data annually to account for county-level changes, given shifting demographics.

    S1501 gives me totals with high school or higher and with bachelor's or higher with both genders and nine race/ethnicity categories (both white alone and white NH).  However, the totals across counties for race are higher than by gender.  

    Since I need race/ethnicity, should I exclude white only, using the other 8 values, and the total?   Unfortunately, the total for gender won't match that for race/ethnicity.   

    Also, S1501 only gives me attainment with high school and above or bachelor's and above.  Some of our searches require the associate's, for which I have to use C15002.  

    What do you recommend I use to have a single total that can be used for percentages by gender and race/ethnicity, which includes multiple levels of attainment, Hispanic/Latinex, white Non-Hispanic, and other clean categories?  Any way to use standard table structures and avoid microdata for this, given the need for annual updates and use as an effective practice for other colleges in the system?

    Thanks very much,

  • One of the challenges with microdata will be that the PUMA boundaries may not align with your county boundaries -- and they will change following the 2020 Census (not clear on that timeline, exactly).

    One approach to ensure consistent totals would be to use the residual to identify an "all other races, non-Hispanic" category. Thus, for race/ethnicity you would have:

    * Total

    * White, Non-Hispanic

    * All other Races, Non-Hispanic (calculated by subtracting White, Non-Hispanic and Hispanic from your relevant totals)

    * Hispanic

    This may not give you the nuance that you want, particularly if you are serving both large Asian and Black populations that colleges may wish to identify in detail.

    Alternatively, you could also look at the race by ethnicity table to see the distribution of other racial groups within Hispanic ethnicity (B03002: Hispanic or Latino Origin by Race). If, for example, you see that there are very few Asian Hispanics and Black Hispanics, you could treat the Asian and Black educational attainment as de facto non-Hispanic classifications. Then use the residual approach described above to identify "all other, non-Hispanic" (subtracting all non-Hispanic race groups and Hispanic from the total). It's not perfect, but it might get you closer to what you want. (With clear documentation of the assumption, of course!)

  • So the issue for you is, how do I match answers from ACS style to another style basically treating Hispanic as a race. You might have to make some guesstimate based on other readings in the county -- like the race-Hispanic breakdowns -- but it would only be a guess that your universe will follow the same pattern as the whole county. But it might be the only guess you have -- and you might find the differences are trivial when you compute them and then you could show it's not so important in this case. 

    Maybe if you looked at microdata -- it won't help you for one county, but maybe you could reproduce your IPEDS data in ACS at a national level and see how close they are?  A

  • As you know, most data are collected with the two-question format, but higher ed reporting is often standardized with a single set of values that includes a two or more races option.  I watched this recently with a state system implementation of PeopleAdmin where data from the two questions are provided in a report, but no guidance for their use in a single reporting set of labels.  So it isn't about making ACS look like IPEDS, but that most people don't want data reported out with the two questions.  

    What you all suggest, though, since I do need this to be doable annually and by multiple institutional researchers at different colleges, is to present the data in the two question format.   Otherwise, I have to live with reweighting estimates, which I much prefer not to do and which others may have difficulty doing.

    I thought I was somehow missing some table which would give me non-Hispanic versions of the other race categories, like done for white. 

    If I have to present data using the two-question format, which of the race values do I pick then to sum with Hispanic to a total that will hopefully match the total estimates?

    Again, thank you for your all's kindness in taking time with my question.


  • I don't think it's an issue of arbitrarily picking a race group to sum -- rather, it's looking at the race/ethnicity distribution (Table B03002 as I mentioned above) and identifying whether you can assume that certain race groups are predominantly "non-Hispanic."

    Then, you have to take the groups that you have identified: White, Non-Hispanic; Hispanic; any other specific group and subtract these from the total. This residual becomes your "all other, non-Hispanic." Whether you end up with White (NH), Hispanic, and All Other (NH), or some different combination depends upon your interpretation of how much you can assume the other race groups can be reasonably classified as non-Hispanic.

  • This reply was deleted.
  • As I work on this, I've encountered the problem where the educational attainment data I use are age 25 plus, where the race/ethnicity data are for the entire population. 

    If I could get B0300222 for age 25 plus I would feel more confident.

    This is leading me to think that I need to report the data using the two question format. For this, I have the column Total Hispanic or Latino Origin.  I don't know, though, which of the other race categories to include for creating a new race total.  Do I use the white alone, black alone etc.?  I see some other race alone.  But what about Two or more races?   Do I exclude that because it has Hispanic/Latino?   

    Any help on weighting with this age constraint and in picking the right race alone categories so I can sum them without double counting for a new total by county is much appreciated.


    • Try using white alone, non-Hispanic, then total Hispanic and all other races using "alone or in combination" (which should cover anyone who might consider herself/himself part of that race or as Hispanic) 
  • Thanks for taking time with this.  Using these educational attainment estimates by gender, race, and ethnicity at the county level for persons age 25+, I am given only certain race labels to choose from, as you know.  The "alone or in combination" text is not there.  This is what I am left with to replicate the two-question format. 

    White non-Hispanic
    Total American Indian or Alaska Native alone
    Total Asian alone
    Total Black alone
    Total Native Hawaiian and Other Pacific Islander alone
    Total Some other race alone
    Total Two or more races

    As expected, the sum of these is less than the total population estimate.  So I added the difference to the Total Some other race alone category.   This deflates other categories, but that is the problem of not having each category with a non-Hispanic label, as is done for white.  

    Does this make sense then?

    Also, while the beta site for microdata allows me to do a cross-tab with educational attainment and race data, the geographic regions are limited as you know, but do include planning commission district sections with all but one of the counties of interest.  However, the Hispanic population is almost half of what I get with other tables, so I am suspect.   I do not want to under-represent this group. 

    Any ideas about whether the first solution is adequate and why the PUMS are on the surface not sufficient?


  • Using the PUMS data which allow me to get data for planning districts on attainment by race, Spanish/Hispanic/Latino, and gender for all ages, I see what I was getting wrong.   In the regular tables for educational attainment age 25+, I can get match the race totals now by using white alone, Asian alone, etc.  My mistake was in trying to keep use white non-Hispanic.  When I exclude this and use the remaining values for race, the totals now match.   

    This means, though, that to have any Hispanic/Latino information I must report the data in the two question format.  My attempt to present a single set of reporting values that combined race and Hispanic/Latino is not possible - with the regular census tables. PUMS lets me cut Hispanic/Latino into race or vice versa.  

    Unfortunately, many of our human resource reporting systems in higher education have been combining the two variables since the early 2000s.  I was hoping to emulate the IPEDS Completions data, but there are more detailed rules on this coding which I will have to go back to and understand further.  

    Thanks JamiRae, Tim, and Rebecca for your patience and help.  Thanks, Tim, for the encouragement to try using PUMS.  The beta software is very intuitive and helpful in finding variables, creating groups, and creating the crosstabs of interest within available geographies.   I will compare the results of the two sources of data for percentages and see how similar they are.  Hopefully, close since both are ACS 5 year and mostly the same geography.