Estimating Rental Burden (GRAPI) by Race of Householder

Hi all, 

I'd like to estimate rent burden by race of householder by state in Stata using iPUMS data. To verify my methodology, I've first tried to recreate ACS estimates for rent burden from tables S2503 and CP04. I'm able to accurately estimate the number of rentals (renter occupied housing units, row 2) in S2503. However, when I try to estimate the burden (CP04 GRAPI at the very bottom), I am consistently off by several percentage points. 

One guess I have about why I am unable to correctly estimate GRAPI is that is appears ACS drops ~3m households from the analysis when calculating GRAPI (at the US level, number vary by state), and I cannot recreate this step. It's also possible there's also something else entirely going wrong. 

I've put the details of how I've set up my estimations below, grateful for any advice you might have!

 

I've svyset my data like this: svyset cluster [pweight = hhwt], strata (strata) vce(linear) singleunit(center)

In addition to the weights and id variables, I'm using: ownershpd rentgrs hhincome gq statefip raced hispand

gen monthincome=hhincome/12
gen grapi=rentgrs/monthincome

gen burden=0
replace burden=1 if grapi>.349

gen renters=0
replace renters=1 if ownershpd==22 | ownershpd==21

gen black=0
replace black=1 if race==2

svy, subpop(renters): mean burden, over (statefip black)

Parents
  • I know this is not a direct reply to your question but have you considered using the CHAS data set available from HUD:

    www.huduser.gov/.../cp.html

    The CHAS data provides information on housing cost burden, including by race and ethnicity.
  • While I haven't tried to retrace your steps, I think this might be a universe issue. The Census Bureau doesn't calculate cost burden for households with no income or negative income. Because Stata treats missing values as the largest possible value, your "if grapi>.349" condition means that your GRAPI variable is valid for all housing records. To mimic the published ACS summary tables, you would need to ensure that only households with positive income have non-missing values for GRAPI.

    Also, the PUMS analyses won't match the ACS summary tables exactly. The weights in the PUMS are designed to match the *total* housing units (almost) exactly at the state level, but not *renter-occupied* housing units. To make sure you've got the right denominator, I would recommend consulting the Census Bureau's PUMS Estimates for User Verification, available at www.census.gov/.../documentation.html.

    Finally, my Stata is too rusty to weigh in on any effects of how you've svyset the data, but I'd recommend using the ACS replicate weights instead; see usa.ipums.org/.../repwt.shtml for more information.

    I agree with Cliff that the CHAS data are preferable, particularly if you're looking for data on relatively small geographic areas, but I also can't always wait a couple additional years for that data to be released. (The most current CHAS data right now are for the 2012-2016 period.)

    Good luck!
    --Matt
Reply
  • While I haven't tried to retrace your steps, I think this might be a universe issue. The Census Bureau doesn't calculate cost burden for households with no income or negative income. Because Stata treats missing values as the largest possible value, your "if grapi>.349" condition means that your GRAPI variable is valid for all housing records. To mimic the published ACS summary tables, you would need to ensure that only households with positive income have non-missing values for GRAPI.

    Also, the PUMS analyses won't match the ACS summary tables exactly. The weights in the PUMS are designed to match the *total* housing units (almost) exactly at the state level, but not *renter-occupied* housing units. To make sure you've got the right denominator, I would recommend consulting the Census Bureau's PUMS Estimates for User Verification, available at www.census.gov/.../documentation.html.

    Finally, my Stata is too rusty to weigh in on any effects of how you've svyset the data, but I'd recommend using the ACS replicate weights instead; see usa.ipums.org/.../repwt.shtml for more information.

    I agree with Cliff that the CHAS data are preferable, particularly if you're looking for data on relatively small geographic areas, but I also can't always wait a couple additional years for that data to be released. (The most current CHAS data right now are for the 2012-2016 period.)

    Good luck!
    --Matt
Children