Statistical significance using PUMS data

Bill Byrnes over 4 years ago

I've been having some trouble finding a direct answer to this question, so I thought I'd toss it in here! I have used PUMS data to calculate rates of poverty for children by race and ethnicity in Illinois. The results I got are here (margins of error with a 90% confidence level are in parentheses):

White: 13.6% (0.4%)

Black: 38.9% (1.2%)

Asian or Pacific Islander: 10.8% (1.2%)

My question is: To test if these differences are statistically significant, can I still use the Census Bureau's methods that they recommend for the ACS data on data.census.gov? Typically, I use the Bureau's statistical testing tool Excel spreadsheet and/or this formula:

If |(Est1 - Est2)/(SE1^2 + SE2^2)| > 1.645, then difference is statistically significant

If that is not the correct way to do it, do you know of any available resources for how to calculate statistical significance for weighted PUMS estimates? Thank you!

Top Replies

Stas Kolenikov over 4 years ago in reply to Matt Schroeder +1

Bill Byrnes I would run

svyby(~poverty, by = ~race, FUN = svymean )

followed by an appropriate svycontrast (see https://rdrr.io/rforge/survey/man/svycontrast.html)

Matt Herman may have a tidycensus analogue…

Matt Schroeder over 4 years ago

Hi, Bill -- what I'd recommend paying more attention to is the method by which the margins of error were estimated. Bottom line, I'd recommend using the replicate weights in stats packages to do your significance tests. It'll be more accurate, and it saves you some manual work comparing standard errors yourself. :)

Longer version:

[Disclaimer: I'm not a statistician, so I'd appreciate whatever clarifications/corrections others can offer.]

If the MOEs are the ones that come out of the usual stats programs (assuming a simple random sample), then the MOEs above are probably too small, because the ACS uses a complex sample design. With PUMS data, it's best to use the replicate weights, which account for that complex sample design (by drawing 80 subsamples from the full PUMS, calculating statistics for each, and looking at the variation in the statistic across those 80 "sample replicates"). For more info, see the IPUMS-USA page on replicate weights or the "Approximating Standard Errors with Replicate Weights" section in the Census Bureau's PUMS accuracy document.

If the MOEs are replicate-based, then the formula above will probably work pretty well, with a caveat.

The actual formula for the standard error of the difference in means is the square root of (the sum of the variances minus (twice the covariance)): ( SE1^2 + SE2^2 - 2Cov(1, 2)) ^ 0.5

The formula we use with the summary files to approximate the standard error of the difference in means -- (SE1^2 + SE2^2) ^ 0.5 -- ignores the covariance between the poverty rates in the different groups.* Since the covariance can be positive or negative, this means that the formula may overestimate or underestimate the actual standard error (as the Census Bureau's instructions for statistical testing in the summary files point out.)

* - What does this covariance term mean in practice? I don't know -- perhaps how the poverty rates in each of the 80 "sample replicates" covary? Does anyone else have an explanation?
Cancel
Up 0 Down

Reply

Cancel
Bill Byrnes over 4 years ago in reply to Matt Schroeder

Hi Matt, thanks for the response! Just to clarify, I calculated the MOEs in my original post using the 80 replicate weights included with the PUMS data. From your response, am I correct in saying that the formula from the Census Bureau I typically use to calculate statistical significance is the correct one? I've gone through the resources you included and I just cannot seem to find anything that indicates we should use a different method for calculating statistically significant differences among PUMS estimates. So far the only method I've encountered for calculating significance with Census estimates is the one I noted above!
Cancel
Up 0 Down

Reply

Cancel
Stas Kolenikov over 4 years ago in reply to Bill Byrnes

Bill Byrnes what package do you use? If you use a proper statistical package (R, Stata, SAS with caveats, SPSS with even bigger caveats), you should be able to properly test for the differences within those packages.
Cancel
Up 0 Down

Reply

Cancel
Bill Byrnes over 4 years ago in reply to Stas Kolenikov

Hi Stas, I use a combination of R and Excel. I'm still sort of a novice at R, so I clean and recode data in R and then export it to Excel. In Excel I use pivot tables to calculate margins of error.
Cancel
Up 0 Down

Reply

Cancel
Matt Schroeder over 4 years ago in reply to Stas Kolenikov

Stas Kolenikov, what are the caveats with SAS? I'm gradually converting to R from SAS, so I'd love a reason to speed up that transition.
Cancel
Up 0 Down

Reply

Cancel
Stas Kolenikov over 4 years ago in reply to Matt Schroeder

Bill Byrnes I would run

svyby(~poverty, by = ~race, FUN = svymean )

followed by an appropriate svycontrast (see https://rdrr.io/rforge/survey/man/svycontrast.html)

Matt Herman may have a tidycensus analogue for that.

Matt Schroeder I just don't like SAS, that's all; any kind of coding in SAS takes three times longer than in R or Stata, and it always runs slower. For ACS and CPS specifically, SAS does not have formal support for SDR although you can fake it as BRR.
Cancel
Up +1 Down

Reply

Cancel
Bill Byrnes over 4 years ago

In my original post, perhaps I didn't clarify enough what I am looking to do. I used replicate weights to calculate the margins of error above. What I'd like to know is this: to test for statistically significant differences among groups, is it still alright to use the equation in the original post, or is there another recommended method? If the equation above will not work, I'm thinking a z-test of two proportions would be appropriate. However, I've never run those kinds of tests with a weighted sample before.
Cancel
Up 0 Down

Reply

Cancel
Matt Schroeder over 4 years ago in reply to Bill Byrnes
Hi Bill -- just to make sure everyone is on the same page, I think it's important to distinguish between two things:

The margins of error (MOEs) for the *proportions* of kids in poverty

The MOEs for the *difference in proportions* between two groups, which depend on #1. As you described above, you compare the difference in proportions to this MOE to test for a statistically significant difference.

Calculating #1 using replicate weights, as you did, is the most important thing; everything below assumes that's the case. From there, there are two basic paths to calculating #2.:

2A: You can use the formula you described above -- (MOE1^2 + MOE2^2) ^ 0.5. That is, square the two margins of error, add those squares together, and take the square root of that sum.

You mentioned that you've never run that test with a weighted sample, but as long as you've used the weights to calculate the proportions and the MOEs for the *proportions* (#1), you've got everything you need. You can treat it like any estimate contained in the summary files, since they both come from the ACS microdata. (But this is only a conceptual similarity; the PUMS contains only a portion of the records in the confidential ACS microdata and undergoes additional disclosure avoidance procedures.)

2B: You can calculate it directly in stats programs at the same time that you calculate #1 using the replicate weights. That will account for the covariance of the two estimates (as I understand it; I could be wrong).

It sounds like you're asking whether #2A will be good enough, and I think it is. I did a brief look at child poverty rates by race in the Twin Cities metro, and #2B yields MOEs for the *difference in proportions* that are within 2-3% of the MOEs you would derive using #2A. This might matter for marginally significant differences between groups, but I'd guess that in most cases your conclusions about statistically significant differences wouldn't depend on how you calculate the MOEs for the *difference in proportions*.

Again, though, I'd defer to those with more statistical expertise than I have.

I hope this helps, but please respond if anything is unclear.
Cancel
Up 0 Down

Reply

Cancel
Bill Byrnes over 4 years ago in reply to Matt Schroeder

Hi Matt, thanks so much for this clarification! This has been really helpful to me.
Cancel
Up 0 Down

Reply

Cancel