Merging the Population and Household PUMS data

Akhil Joseph over 3 years ago

Hi Group,

I'm working with the 5 year ACS PUMS data for the state of Indiana to calculate different variables which requires merging the population and household data. I'm working on these using python and SQL and was trying to merge the datasets using a left join (population data, household data) thus keeps all the records from population data and its matching ones from the household data using the serial no. Later If I calculate the variable the number I'm getting for households for different conditions are not matching with the numbers that I obtain from MDAT. Have anyone done similar joins using SQL or Python?

Thanks!

Parents

Ani over 3 years ago

Can you post the code that precedes the joins and then the code for the joins itself? It would also help if the PUMS variables are explicitly listed. Thank you
Cancel
Up 0 Down

Reply

Cancel

Reply

Ani over 3 years ago

Can you post the code that precedes the joins and then the code for the joins itself? It would also help if the PUMS variables are explicitly listed. Thank you
Cancel
Up 0 Down

Reply

Cancel

Children

Akhil Joseph over 3 years ago in reply to Ani

Reading the household and population data in python

pdata = pd.read_csv('psam_p18.csv')
hdata = pd.read_csv('psam_h18.csv')

Changing the serial number to string to avoid datatype mismatch when merging using serial no.

pdata = pdata.astype({'SERIALNO':'string'})
hdata = hdata.astype({'SERIALNO':'string'})

Merging the datasets using left join to keep all the records from population data and matching records from housing data

in_data = pd.merge(pdata, hdata, how = 'left', on = 'SERIALNO')

Now, calculating the number Total Households Age 55+ Living Alone

a = in_data[(in_data.AGEP >= 55) & (in_data.HHT.isin([4, 6]))]

a.WGTP.sum()

The number I get is "184320" where as the number given by MDAT is "443428"
Cancel
Up 0 Down

Reply

Cancel