Educational attainment by race/ethnicity and county - can't get totals and subtotals to match

Good afternoon,

I am trying to use the latest ACS 5 Year data to document race and gender by educational attainment at the county level for the service region of our community college.   I used to use Factfinder for this, but have a question with the results I am getting from data.census.gov and would like to get some help.  I’ve read through the related user guides for researchers, looked through the FAQs, and done a few searches of the forum.  I’ve read through the product list and have used certain tables a lot in the past.   I'm sorry if this has been covered before; if so please just point me to the threads.  

I am able to select the geographic counties of interest without problem.  I have determined which tables contain educational attainment data.  Since I need to have all of my counties of interest, I am using the 5 year data so all counties are represented in the sample.  I know that using just 2018 does not provide me with all of the counties I need. 

When I pick table B15003, I am able to select data on attainment level and get totals for US and counties.

When I pick table B15002, I am able to select data on attainment level by gender and get totals for US and counties.

Fortunately, the US and county totals for these two tables for the ACS 5 Yr match.

The problem I have is that I want breakouts of attainment level by race, for which I am using B15002.  I know there are separate tables for each race/ethnicity.  I am aware of the two-question format for Hispanic and then for race and how these are combined.   I am able to access county level data for white (C15002A), Black (C15002B), American Indian (C15002C), Asian (C15002D), Native Hawaiian PI (C15002E), Other race (C15002F), 2 or more (C15002G), White non-Hispanic (C15002H), and Hispanic/Latino (C15002I).

My assumption is that I can only use one of these white tables, probably white non-Hispanic, since I want to also use Hispanic.

The problem is that the sum of these tables does not equal what I get by attainment level and gender with B15003 and B15002. 

I thought I would be able to do this in this way, since I am always using 5 year data for ACS18, the same counties, and the same (collapsed in some cases) attainment levels.  But even without looking at attainment, the grand totals are different.   The total for B15002 and B15003 is the same.  But when I use excel to total all of the race/ethnicity breakouts (except white but including white non-Hispanic), I get a higher number than in B15003 and B15002.

I am used to using US Dept of Ed NCES IPEDS data on attainment by race/ethnicity, which incorporates a single set of labels for race/ethnicity that includes non-Hispanic versions of each race code.  I do not see those provided for these census tables, except for white non-Hispanic. 

So I am not sure how to proceed.  Any and all help with this is much appreciated.  I had it worked out I thought in Factfinder, but realize it was the same table structures.  So I am not sure where the problem lies.    I could have posted this to the user community, but thought my question must be something everyone else knows already.

Thank you very much for your assistance,

John

John Milam, Ph.D.
Professor and Director of Institutional Effectiveness
Coordinator, LFCC IRB
Lord Fairfax Community College
173 Skirmisher Lane
Middletown, VA 22645-1745
(540) 868-7249

jmilam@lfcc.edu
http://knowledgetowork.com

Parents
  • You might have to go to microdata for that -- attainment for non-Hispanic components of races other than white -- ipums.org , but unless you need to match NCES I think it makes more sense to include Hispanic portions of all races (other than white, since non-Hispanic white is the definition of the non-minority-population) --- some people really are both Black and Hispanic, Asian and Hispanic etc -- that doesn't make them less Black or Asian. Your numbers won't add up to 100% but that's not a reason to take out part of a race of people. IMHO. 

  • Sorry. the new features on this forum don't let me just edit without a whole new notice being sent out via emails.

    What I had meant to say is basically what Tim said.. only he responded much better than I:

    Could it be that you are not taking into account that Hispanic can be of ANY RACE and not just White?  You have white, non Hispanic, but you do not have values for the non-Hispanic version of the other races. 

  • Thank you both for taking time to help me with this.  As you note, there is a variable for Total White alone, not Hispanic or Latino (S1501_C01_031E), but this does not exist for the others (such as Total American Indian or Alaska Native alone).  

    I am using these data for hiring with percentages compared between the pool (IPEDS Completions for faculty or ACS5Yr for staff), so I need a total.  If I include Hispanic as a category, as done with IPEDS, then the total of the race/ethnicity categories is higher than the total estimate.  

    Since IPEDS does not present the data in the two-question format, I was hoping to use the same values in ACS.  While I can get White NH, I can't get it for the others without microdata.   I am reluctant to do that because I want community colleges across the system to be able to use this approach to help ensure better equity in hiring.  I also want to update the data annually to account for county-level changes, given shifting demographics.

    S1501 gives me totals with high school or higher and with bachelor's or higher with both genders and nine race/ethnicity categories (both white alone and white NH).  However, the totals across counties for race are higher than by gender.  

    Since I need race/ethnicity, should I exclude white only, using the other 8 values, and the total?   Unfortunately, the total for gender won't match that for race/ethnicity.   

    Also, S1501 only gives me attainment with high school and above or bachelor's and above.  Some of our searches require the associate's, for which I have to use C15002.  

    What do you recommend I use to have a single total that can be used for percentages by gender and race/ethnicity, which includes multiple levels of attainment, Hispanic/Latinex, white Non-Hispanic, and other clean categories?  Any way to use standard table structures and avoid microdata for this, given the need for annual updates and use as an effective practice for other colleges in the system?

    Thanks very much,
    John 

  • One of the challenges with microdata will be that the PUMA boundaries may not align with your county boundaries -- and they will change following the 2020 Census (not clear on that timeline, exactly).

    One approach to ensure consistent totals would be to use the residual to identify an "all other races, non-Hispanic" category. Thus, for race/ethnicity you would have:

    * Total

    * White, Non-Hispanic

    * All other Races, Non-Hispanic (calculated by subtracting White, Non-Hispanic and Hispanic from your relevant totals)

    * Hispanic

    This may not give you the nuance that you want, particularly if you are serving both large Asian and Black populations that colleges may wish to identify in detail.

    Alternatively, you could also look at the race by ethnicity table to see the distribution of other racial groups within Hispanic ethnicity (B03002: Hispanic or Latino Origin by Race). If, for example, you see that there are very few Asian Hispanics and Black Hispanics, you could treat the Asian and Black educational attainment as de facto non-Hispanic classifications. Then use the residual approach described above to identify "all other, non-Hispanic" (subtracting all non-Hispanic race groups and Hispanic from the total). It's not perfect, but it might get you closer to what you want. (With clear documentation of the assumption, of course!)

Reply
  • One of the challenges with microdata will be that the PUMA boundaries may not align with your county boundaries -- and they will change following the 2020 Census (not clear on that timeline, exactly).

    One approach to ensure consistent totals would be to use the residual to identify an "all other races, non-Hispanic" category. Thus, for race/ethnicity you would have:

    * Total

    * White, Non-Hispanic

    * All other Races, Non-Hispanic (calculated by subtracting White, Non-Hispanic and Hispanic from your relevant totals)

    * Hispanic

    This may not give you the nuance that you want, particularly if you are serving both large Asian and Black populations that colleges may wish to identify in detail.

    Alternatively, you could also look at the race by ethnicity table to see the distribution of other racial groups within Hispanic ethnicity (B03002: Hispanic or Latino Origin by Race). If, for example, you see that there are very few Asian Hispanics and Black Hispanics, you could treat the Asian and Black educational attainment as de facto non-Hispanic classifications. Then use the residual approach described above to identify "all other, non-Hispanic" (subtracting all non-Hispanic race groups and Hispanic from the total). It's not perfect, but it might get you closer to what you want. (With clear documentation of the assumption, of course!)

Children
No Data