Hi. I'm preparing to run a probit regression in Stata using ACS PUMS data.
Does anyone have experience using the Stata svyset command (or, more generally, specifying relevant survey design factors in a statistical analysis program) with PUMS files?
The following is suggested for Current Population Survey analyses (see https://www.stata.com/statalist/archive/2008-04/msg00444.html), which uses state (gestcen) and consolidated statistical area (gtcsa) variables.
egen psu=group(gestcen gtcsa) svyset [pw=mars], strat(gestcen) psu(psu)Does anyone have an analogous specification for PUMS?
Hi Michele,
I've used the Stata svy commands to analyze survey data (CPS, SIPP, NHIS). The first step is to svyset the data so Stata knows the sample design.
svyset [pw=wgtp], sdr(wgtp1 - wgtp80) vce(sdr) mse
(This example uses the single year 2010 PUMS dataset, ss10hak. The weights used are household-level weights.)
After svysetting the data, you run the command using the svy: prefix, which passes along the options you defined above. Stata will execute this command using the full-sample weights and again for each set of replicate weights. There are two important things to note:
svyset
svy:
(1) Not all Stata commands can be run with the svy: prefix.
(2) If you want to limit your replicate analyses to a subset of the sample (for example, all persons aged 25-64 or all African Americans), you should not use if or in. Instead, use the subpop() option before the colon, as in
if
in
subpop()
. gen byte age25_64 = age>=25 & age<=64 . svy, subpop(age25_64): command
Note that you must first define the subpopulation with a dichotomous variable coded 0 for all cases that should be excluded from the analysis.
Here a some additional resources that may be helpful:
https://www.stata.com/manuals13/svysvyestimation.pdf
https://www.stata.com/manuals13/svysvysdr.pdf
https://usa.ipums.org/usa/repwt.shtml
http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/sample_surveys/svy_commands/
https://stats.idre.ucla.edu/other/mult-pkg/faq/sample-setups-for-commonly-used-survey-data-sets/
Applying Occam's razor to the -subpop- option, you can just as well run
svy, subpop( if inrange(age,25,64) ): command