ACS data 2019 and 2020

banasua 9 months ago

Hello,

I am using ACS PUMs data 2019 and 2020 to develop some dashboards and was writing a manuscript about it. I found that there are few records for individuals 16-17 years age and below high school level education with very high incomes and they do show up as the highest mean incomes (wage income and total income). They are greater than $400,000 annually and cannot be correct. Should i remove these records to do my analysis?

Thanks,

Anasua

Eric Grosso 9 months ago

Would need more information to answer you. Paper's topic and universe?
Cancel
Up 0 Down

Reply

Cancel

banasua 8 months ago

I am working on a project using 2019 and 2020 one-year data to develop dashboards classified by occupation codes and grouped into two groups, healthcare workers and non-healthcare workers. There are three dashboards being developed, (i) the first dashboard uses data on selected demographic characteristics, (ii) the second dashboard uses data on mean wage income and mean total income, (iii) the third dashboard uses data on income and demographic characteristics for health workers. As a part of this project I am also writing a manuscript to explain these dashboards which will also incorporate some tables explaining the dashboards. While working on the tables I came across few highest mean incomes for 2019 and 2020 that are earned by those in the age group of 16-17 years old and few who are older but have below high school level education. Now if we keep these records they would show up as the highest ones which cannot be correct and if we remove them then we have to remove more observations based on the condition we use to remove observations, for example, those with below high school education or 16-17 years old earning >$400,000. My question is how to address this issue? I am adding some of these observations here in a table below, but there are more.

occupation code

year

health workers

wage income

total income

age groups

education

race

hispanic ethnicity

marital status

citizenship

nativity

language spoken

sex

120

2020

408,497

16-17

Below high school

White alone

Non Hispanic

Married

Born in the United States

Nativie

Only English

Female

1310

2020

412,521

35-54

Below high school

White alone

Non Hispanic

Married

Born in the United States

Nativie

Only English

Male

8610

2019

458,606

16-17

Below high school

White alone

Non Hispanic

Never married

Born in the United States

Nativie

Only English

Female

3324

2020

501,062

55-64

Below high school

Some Other Race alone

Non Hispanic

Married

Born in the United States

Nativie

Only English

Male

3324

2020

540,302

55-64

Below high school

American Indian alone

Hispanic

Married

U.S. citizen by naturalization

Foreign born

Language other than English

Male

3324

2019

557,600

555,681

18-34

Below high school

White alone

Hispanic

Never married

Born abroad of U.S. citizen parent or parents

Nativie

Language other than English

Male

3324

2019

557,600

555,681

18-34

Below high school

White alone

Hispanic

Never married

U.S. citizen by naturalization

Foreign born

Language other than English

Male

2020

665,065

671,101

55-64

Below high school

Two or More Races

Hispanic

Married

Not a U.S. citizen

Foreign born

Language other than English

Male

3256

2020

665,065

673,516

35-54

Below high school

Black or African American alone

Non Hispanic

Never married

Born in the United States

Nativie

Only English

Female

Eric Grosso 8 months ago in reply to banasua

May I ask source of this PUMS? These are the row level data that you've recoded, correct? You have not aggregated these, right?
Cancel
Up 0 Down

Reply

Cancel
banasua 8 months ago in reply to Eric Grosso

I had downloaded the psum_pusa and psum_pusab from the website. And yes, these are records recoded, not aggregate data.
Cancel
Up 0 Down

Reply

Cancel
Eric Grosso 8 months ago in reply to banasua

I'm still not 100% sure what your goal is here but if you are using the PUMS as input data then I would leave it alone and not throw out these outliers, which is really what these are. I would not characterize these as "incorrect" as implausible as they seem.

On the other hand, if you feel you have to discard them their weight would be so small in your aggregation that they probably wouldn't have a huge effect as you're using national files.
Cancel
Up 0 Down

Reply

Cancel