ACS data 2019 and 2020

Hello,

I am using ACS PUMs data 2019 and 2020 to develop some dashboards and was writing a manuscript about it. I found that there are few records for individuals 16-17 years age and below high school level education with very high incomes and they do show up as the highest mean incomes (wage income and total income). They are greater than $400,000 annually and cannot be correct. Should i remove these records to do my analysis?

Thanks,

Anasua 

Parents
  • I am working on a project using 2019 and 2020 one-year data to develop dashboards classified by occupation codes and grouped into two groups, healthcare workers and non-healthcare workers. There are three dashboards being developed, (i) the first dashboard uses data on selected demographic characteristics, (ii) the second dashboard uses data on mean wage income and mean total income, (iii) the third dashboard uses data on income and demographic characteristics for health workers. As a part of this project I am also writing a manuscript to explain these dashboards which will also incorporate some tables explaining the dashboards. While working on the tables I came across few highest mean incomes for 2019 and 2020 that are earned by those in the age group of 16-17 years old and few who are older but have below high school level education. Now if we keep these records they would show up as the highest ones which cannot be correct and if we remove them then we have to remove more observations based on the condition we use to remove observations, for example, those with below high school education or 16-17 years old earning >$400,000. My question is how to address this issue? I am adding some of these observations here in a table below, but there are more.

    occupation code year health workers  wage income   total income  age groups education race hispanic ethnicity marital status citizenship nativity language spoken sex
    120 2020 0   408,497   408,497 16-17 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Female
    1310 2020 0   412,521   412,521 35-54 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Male
    8610 2019 0   458,606   458,606 16-17 Below high school White alone Non Hispanic Never married Born in the United States Nativie Only English Female
    3324 2020 1   501,062   501,062 55-64 Below high school Some Other Race alone Non Hispanic Married Born in the United States Nativie Only English Male
    3324 2020 1   540,302   540,302 55-64 Below high school American Indian alone Hispanic Married U.S. citizen by naturalization Foreign born Language other than English Male
    3324 2019 1   557,600   555,681 18-34 Below high school White alone Hispanic Never married Born abroad of U.S. citizen parent or parents Nativie Language other than English Male
    3324 2019 1   557,600   555,681 18-34 Below high school White alone Hispanic Never married U.S. citizen by naturalization Foreign born Language other than English Male
    40 2020 0   665,065   671,101 55-64 Below high school Two or More Races Hispanic Married Not a U.S. citizen Foreign born Language other than English Male
    3256 2020 1   665,065   673,516 35-54 Below high school Black or African American alone Non Hispanic Never married Born in the United States Nativie Only English Female
Reply
  • I am working on a project using 2019 and 2020 one-year data to develop dashboards classified by occupation codes and grouped into two groups, healthcare workers and non-healthcare workers. There are three dashboards being developed, (i) the first dashboard uses data on selected demographic characteristics, (ii) the second dashboard uses data on mean wage income and mean total income, (iii) the third dashboard uses data on income and demographic characteristics for health workers. As a part of this project I am also writing a manuscript to explain these dashboards which will also incorporate some tables explaining the dashboards. While working on the tables I came across few highest mean incomes for 2019 and 2020 that are earned by those in the age group of 16-17 years old and few who are older but have below high school level education. Now if we keep these records they would show up as the highest ones which cannot be correct and if we remove them then we have to remove more observations based on the condition we use to remove observations, for example, those with below high school education or 16-17 years old earning >$400,000. My question is how to address this issue? I am adding some of these observations here in a table below, but there are more.

    occupation code year health workers  wage income   total income  age groups education race hispanic ethnicity marital status citizenship nativity language spoken sex
    120 2020 0   408,497   408,497 16-17 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Female
    1310 2020 0   412,521   412,521 35-54 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Male
    8610 2019 0   458,606   458,606 16-17 Below high school White alone Non Hispanic Never married Born in the United States Nativie Only English Female
    3324 2020 1   501,062   501,062 55-64 Below high school Some Other Race alone Non Hispanic Married Born in the United States Nativie Only English Male
    3324 2020 1   540,302   540,302 55-64 Below high school American Indian alone Hispanic Married U.S. citizen by naturalization Foreign born Language other than English Male
    3324 2019 1   557,600   555,681 18-34 Below high school White alone Hispanic Never married Born abroad of U.S. citizen parent or parents Nativie Language other than English Male
    3324 2019 1   557,600   555,681 18-34 Below high school White alone Hispanic Never married U.S. citizen by naturalization Foreign born Language other than English Male
    40 2020 0   665,065   671,101 55-64 Below high school Two or More Races Hispanic Married Not a U.S. citizen Foreign born Language other than English Male
    3256 2020 1   665,065   673,516 35-54 Below high school Black or African American alone Non Hispanic Never married Born in the United States Nativie Only English Female
Children
  • May I ask source of this PUMS?  These are the row level data that you've recoded, correct?  You have not aggregated these, right? 

  • I had downloaded the psum_pusa and psum_pusab from the website. And yes, these are records recoded, not aggregate data.

  • I'm still not 100% sure what your goal is here but if you are using the PUMS as input data then I would leave it alone and not throw out these outliers, which is really what these are.  I would not characterize these as "incorrect" as implausible as they seem. 

    On the other hand, if you feel you have to discard them their weight would be so small in your aggregation that they probably wouldn't have a huge effect as you're using national files.