# PUMS: Causal Inference and Weight

Hi All,

I'm currently working on a project looking at the relationship between income and place of residence (by PUMA/County) using the 1-Yr ACS PUMS. I want to use covariate balance/matching to show that living in a certain PUMA/county affects one's income. I was wondering if I have to use the person weight (PWGT) before I do the matching/balancing process. Do I have to use the weight (PWGT) at all for causal inference analysis (propensity score matching, regression discontinuity)? I have not used PUMS for causal inference analysis before and I would like to hear your feedback. Thank you so much for your time!

-Tran

Parents
• Short answer: Yes, you should weight cases when conducting statistical tests.
• Longer answer: there is a lot going on in this question. I personally am not satisfied with the existing answers. The problem is that you have two VASTLY different inferences you need to make:

1. inference with respect to the pseudo-random assignment to treatment / control that you are trying to fake with matching, and

2. inference with respect to the sampling design: the sample did not come as an i.i.d., and you absolutely have to control for that.

If your estimator is a standard matching estimator where you match 1:1 from CA vs. the rest of the country, and compute the contribution to your treatment effect estimator from a given pair, you need to weight by the pairwise selection probability. For ACS, since it is such a humongous survey, the units in two different PUMAs are independent, so you can multiply their weights; the estimator then needs to be of the Hajek form,

t = numerator / denominator

numerator = sum over matched pairs (outcome^1_{unit i} - outcome^0_{unit j}) weight_i weight_j

denominator = sum over matched pairs weight_i weight_j

Getting the standard errors for that can be done via replicate SDR weights, see https://usa.ipums.org/usa/repwt.shtml

If your estimator is the regression discontinuity estimator, then it is a version of weighted regression where you have a kernel near discontinuity. You can fake this in Stata svy / R library survey by pre-multiplying the survey weights by your kernel weights (although if you use replicate weights, you'd have to do that for every replicate, so you better have the function that takes the pweights and multiplies them by the discountinuity kernel weights).

So each causal inference estimator would have to have its own fix for the complex survey setting.

• Longer answer: there is a lot going on in this question. I personally am not satisfied with the existing answers. The problem is that you have two VASTLY different inferences you need to make:

1. inference with respect to the pseudo-random assignment to treatment / control that you are trying to fake with matching, and

2. inference with respect to the sampling design: the sample did not come as an i.i.d., and you absolutely have to control for that.

If your estimator is a standard matching estimator where you match 1:1 from CA vs. the rest of the country, and compute the contribution to your treatment effect estimator from a given pair, you need to weight by the pairwise selection probability. For ACS, since it is such a humongous survey, the units in two different PUMAs are independent, so you can multiply their weights; the estimator then needs to be of the Hajek form,

t = numerator / denominator

numerator = sum over matched pairs (outcome^1_{unit i} - outcome^0_{unit j}) weight_i weight_j

denominator = sum over matched pairs weight_i weight_j

Getting the standard errors for that can be done via replicate SDR weights, see https://usa.ipums.org/usa/repwt.shtml

If your estimator is the regression discontinuity estimator, then it is a version of weighted regression where you have a kernel near discontinuity. You can fake this in Stata svy / R library survey by pre-multiplying the survey weights by your kernel weights (although if you use replicate weights, you'd have to do that for every replicate, so you better have the function that takes the pweights and multiplies them by the discountinuity kernel weights).

So each causal inference estimator would have to have its own fix for the complex survey setting.

Children
No Data