Hello, I am using PUMS data to calculate the percentage of electricity cost to annual household income in an area. I notice that there are a lot of negative numbers in the household income. After aggregating my data by using R, I found the average percentage of negative household income for the whole area is 6.9%, but there is a specific region that has 32% of households reported negative annual income. May I ask what's the reason for these negative numbers and how should I preprocess this data? Should I remove all of them?
Besides, I am calculating the electricity burden for the area using annual electricity cost divided by annual household income. In the raw data that I downloaded from PUMS by using R, There are a great number of households that have monthly electricity costs(ELEP) and household income(HINCP) like this:
This data doesn't make any sense to me. Calculating the electricity burden by using this part of the data highly influenced my final results. Does anyone know what's the reason for these data and how should I manipulate them?
I highly appreciate your help!
Thanks,
Siwei