I'm wondering if anyone is familiar with the CHAS data set (a special tabulation of ACS data for HUD).
I'd prefer to use ACS data directly to calculate the CHAS estimate of housing problems (defined as a housing unit experiencing 1 of 4 housing problems including overcrowding, cost burden, lack of plumbing, or kitchen facilities). However it seems like CHAS is the only way to do so because it offers a unique count of the units experiencing at least 1 of 4 housing problems, where with ACS data there's potential overlap in the count of households (since they're separate questions). For example tabulating from ACS, if a unit had lack of plumbing, crowding and was cost burdened, it would appear that 3 units had problems whereas it should be counted as a single unit, having at least 1 of the 4 problems.
My conclusion was that CHAS avoids this and therefore I shouldn't/cannot use ACS, but I'm wondering if there's a way around it? Anyone else have experience with CHAS and know if I'm on the right track?
If the housing problem tabulation is your variable of interest, then yes, CHAS makes compiling that data very elegant because it filters out the overlapping responses. It can be done manually using a microdata…
If the housing problem tabulation is your variable of interest, then yes, CHAS makes compiling that data very elegant because it filters out the overlapping responses. It can be done manually using a microdata file, but then you're unable to assess any areas smaller than a PUMA. It all really depends on your exact research question. Happy to discuss further.
Thanks Bryan! I'm only looking at state and US estimates for the general population as well as by race and ethnicity, so perhaps I could use a microdata file.
I'm also curious, do you know if it would be appropriate to calculate the MOEs for US estimates of housing problems by race/ethnicity in the CHAS data? Following the ACS handbook I know we can aggregate MOEs by geography or by subgroup but this would be across both (income, renter vs owner, and then by state) which I've never done before...
I work for HUD and managed the development of the ACS-based CHAS data. As Bryan said, the CHAS data are perfect for the scenario you described. With standard ACS tables there's no way to eliminate duplicative housing problems. (thanks, Bryan!). Since you're interested in state and US level estimates, you could also use PUMS data to come up with your own estimates, but that might be harder depending on how familiar you are with PUMS data and Census variable definitions.
Regarding your second question--yes you can calculate MOEs in the CHAS data. The raw data files HUD posts include MOE. If you use the CHAS data to create derived estimates, you would need to calculate a derived MOE, the same way you would with regular ACS data (chapter 8 of the handbook you referenced: https://www.census.gov/content/dam/Census/library/publications/2018/acs/acs_general_handbook_2018_ch08.pdf.)
Feel free to reach out if you have questions or want to discuss your results.
Thanks for your reply Paul! Glad to have you as a resource.
I'm wondering if aggregating MOEs across both geography and groups is a valid way to calculate the CIs for US estimates by race? So combining all states, all income levels, and owner+renter for each race/ethnicity? I haven't seen an example of aggregating across both (for ACS in general) for the derived calculations.
MOE and CI have limited usefulness in something like the ACS where you never really know how things are weighted, which can't be addressed with sampling error. Just as an example I did an exercise for a conference showing tracts for a city with foreign born, and one tract had exactly one foreign-born person, born in Colombia. Impossible under any sampling scenario to find the characteristics of exactly one person in a tract, obviously. I get more confident when I see lots of examples of a pattern in different areas and in different years. Then it seems real to me even if the exact numbers are obviously in doubt, especially in small areas.
Thanks very much for all your work on the CHAS data, Paul. Do you know when the 2013-2017 CHAS release will be coming out? That information would help me plan a couple projects that could benefit from it.
Glad to see you are available to answer questions about CHAS. It's a data set that we use quite often but for which we have had a hard time finding documentation about how the data is constructed.
My questions regard how HUD determines the income intervals applied when querying ACS data to build the CHAS table.
For those who are unsure why this might be problematic one has to know that the 80% of median income level is legislatively capped at a national level. In higher income regions, such as Boston where I work, the "80%" income level used by HUD to determine program eligibility is actually closer to 72% or 77% of "true" HUD Area Median Family Income (HAMFI, as it is called). The size of the discrepancy varies from year to year and region to region based on the calculation HUD performs to derive HAMFI. There is a separate cap for HAMFI's 50% level.
Thanks for you help.
Matt: The 2013-2017 data are out now: https://www.huduser.gov/portal/datasets/cp.html.
Elise: Interesting question. I don't know the answer, so I'll have to punt and say that if it's appropriate to do that using standard ACS data then it's fine to do it using CHAS data.
Cliff: I understand your concerns. The CHAS income limits are slightly different from the standard HUD income limits. They're produced by the same team (not by me) and, off the top of my head, I don't recall the specific ways in which they differ. We probably should be more transparent about this. My colleagues and I will look into what we can do. We might start releasing the CHAS income limits, along with a write-up of how they are calculated.
Adding documentation about calculation of the CHAS income limits would be a great help to data users.