I've been having some trouble finding a direct answer to this question, so I thought I'd toss it in here! I have used PUMS data to calculate rates of poverty for children by race and ethnicity in Illinois. The results I got are here (margins of error with a 90% confidence level are in parentheses):
White: 13.6% (0.4%)
Black: 38.9% (1.2%)
Asian or Pacific Islander: 10.8% (1.2%)
My question is: To test if these differences are statistically significant, can I still use the Census Bureau's methods that they recommend for the ACS data on data.census.gov? Typically, I use the Bureau's statistical testing tool Excel spreadsheet and/or this formula:
If |(Est1 - Est2)/(SE1^2 + SE2^2)| > 1.645, then difference is statistically significant
If that is not the correct way to do it, do you know of any available resources for how to calculate statistical significance for weighted PUMS estimates? Thank you!
Bill Byrnes I would run svyby(~poverty, by = ~race, FUN = svymean )
followed by an appropriate svycontrast (see https://rdrr.io/rforge/survey/man/svycontrast.html)
Matt Herman may have a tidycensus analogue…
In my original post, perhaps I didn't clarify enough what I am looking to do. I used replicate weights to calculate the margins of error above. What I'd like to know is this: to test for statistically significant differences among groups, is it still alright to use the equation in the original post, or is there another recommended method? If the equation above will not work, I'm thinking a z-test of two proportions would be appropriate. However, I've never run those kinds of tests with a weighted sample before.
Hi Bill -- just to make sure everyone is on the same page, I think it's important to distinguish between two things:
Calculating #1 using replicate weights, as you did, is the most important thing; everything below assumes that's the case. From there, there are two basic paths to calculating #2.:
It sounds like you're asking whether #2A will be good enough, and I think it is. I did a brief look at child poverty rates by race in the Twin Cities metro, and #2B yields MOEs for the *difference in proportions* that are within 2-3% of the MOEs you would derive using #2A. This might matter for marginally significant differences between groups, but I'd guess that in most cases your conclusions about statistically significant differences wouldn't depend on how you calculate the MOEs for the *difference in proportions*.
Again, though, I'd defer to those with more statistical expertise than I have.
I hope this helps, but please respond if anything is unclear.
Hi Matt, thanks so much for this clarification! This has been really helpful to me.