Proc surveymeans - is domain statement needed for subpop analysis?

MyDzung Chu over 10 years ago

I read that when conducting subpopulation analysis for survey data using SAS proc surveymeans, one should use the DOMAIN statement, instead of BY and WHERE, to obtain correct SEs. I tried each of this approach with the ACS PUMS data, using varmethod=jackknife and the 80 replicate weights, and got the same SEs for all three approaches. Can someone explain why this is the case?

Parents

Stas Kolenikov over 10 years ago

You got lucky this time, but this may not happen every time.

One imminent danger is that by restricting the sample with WHERE you may end up with the singleton strata with one PSU remaining. Depending on how the jackknife procedure is implemented, SAS may or may not raise an eyebrow when the jackknifed result would not differ from the original (Stata does mark the replicate as questionable in this situation, I think).

Another effect of using WHERE is that you change the sample size that the software thinks was specified along with your design, treating it as fixed, although in reality the domain sizes are random, adding to the uncertainty and hence increasing the standard errors. DOMAIN statement correctly treats that latter randomness.

For a Stata user, I would recommend www.stata-journal.com/article.html and onlinelibrary.wiley.com/.../summary (where the latter is a much broader treatment of many topic in analysis of survey data... and I can't deny I wrote it :) ). The conceptual frameworks are of course the same regardless of whether you work in SAS, Stata or R.
Cancel
Up 0 Down

Reply

Cancel

Reply

Stas Kolenikov over 10 years ago

You got lucky this time, but this may not happen every time.

One imminent danger is that by restricting the sample with WHERE you may end up with the singleton strata with one PSU remaining. Depending on how the jackknife procedure is implemented, SAS may or may not raise an eyebrow when the jackknifed result would not differ from the original (Stata does mark the replicate as questionable in this situation, I think).

Another effect of using WHERE is that you change the sample size that the software thinks was specified along with your design, treating it as fixed, although in reality the domain sizes are random, adding to the uncertainty and hence increasing the standard errors. DOMAIN statement correctly treats that latter randomness.

For a Stata user, I would recommend www.stata-journal.com/article.html and onlinelibrary.wiley.com/.../summary (where the latter is a much broader treatment of many topic in analysis of survey data... and I can't deny I wrote it :) ). The conceptual frameworks are of course the same regardless of whether you work in SAS, Stata or R.
Cancel
Up 0 Down

Reply

Cancel

Children

No Data