How to deal with negative income data in PUMS data

Hello, I am using PUMS data to calculate the percentage of electricity cost to annual household income in an area. I notice that there are a lot of negative numbers in the household income. After aggregating my data by using R, I found the average percentage of negative household income for the whole area is 6.9%, but there is a specific region that has 32% of households reported negative annual income. May I ask what's the reason for these negative numbers and how should I preprocess this data? Should I remove all of them? 

Besides, I am calculating the electricity burden for the area using annual electricity cost divided by annual household income. In the raw data that I downloaded from PUMS by using R, There are a great number of households that have monthly electricity costs(ELEP) and household income(HINCP) like this:

This data doesn't make any sense to me. Calculating the electricity burden by using this part of the data highly influenced my final results. Does anyone know what's the reason for these data and how should I manipulate them? 

I highly appreciate your help!

Thanks, 

Siwei 

Parents
  • Check the PUMS data dictionary

    www2.census.gov/.../PUMS_Data_Dictionary_2021.pdf

    Negative numbers indicate that the household had a loss.  While there a a lot of ways this can happen, if you owned stock which you sold for a loss and hand no other income then your income would be negative.  Another example would be selling property at a loss.

    You could also own a business that took a loss in for year (the business expenses exceeded business income).  The income reported on your 1040 would be negative.

    PINCP Numeric    7
    Total person's income (signed, use ADJINC to adjust to constant   dollars)
    bbbbbbb
        N/A (less than 15 years old)
    0   None
    -19998
    .Loss of $19998 or more (Rounded and bottom-
    .coded components)
    -19997..-1
    .Loss $1 to $19997 (Rounded components)
    1..4209995
    .$1 to $4209995 (Rounded and top-coded
    .components)

    PS

    Make sure that you handle missing values correctly. If you use data from the FTP website missing data will be blank.  If you use the API then the missing value will be a number.  The API interface cannot return a blank.  The missing value will be a number outside the range given in the PUMS data dictionary.  I'm not sure what happens if you download via data.census.gov.

    Dave

  • Hi Dave, 

    This is super helpful! Thank you so much for helping me understand the data set better!

    As for my second question, since there are a lot of records with over $100 monthly electricity costs but single digits of household income, do you have any ideas of what I can do to process this? 

    I appreciate your time and assistance. 

    Sincerely, 

    Siwei 

Reply
  • Hi Dave, 

    This is super helpful! Thank you so much for helping me understand the data set better!

    As for my second question, since there are a lot of records with over $100 monthly electricity costs but single digits of household income, do you have any ideas of what I can do to process this? 

    I appreciate your time and assistance. 

    Sincerely, 

    Siwei 

Children
  • After I wrote my earlier reply, I starting getting ready for your next question as above (below).  I would start with the case when the household has zero income. They are living off savings only.  How do you want to report the information about the electricity costs ? You can't divide by zero.  Also in income is very low the electricity costs as a percentage of total (gross) income will be very high. Many times income.

    Since this is essentially an accounting question, you might try to find the conventions used in financial reports. I think that you will usually find the convention of placing * on the output or NA N/A with a footnote. For example you might consider taking records where income is positive and income > electricity costs. Also report the number/percent of household that have been dropped from your calculation (2 numbers).  You might consider a graph with income on one axis and electricity costs on the other axis.  This will all depend on your audience.

    Dave