Quasi-experimental design using PUMS data

I want to do a QED analysis on ACS data for causal inference -- specifically between broadband and income. Before proceeding with that I want to understand the nuances of using such method on the ACS data. I could not find any existing work that has done QED matching with the PUMS data. Does anyone know of any work using it or criticizing such methods?

Thanks,

Tarun

Parents
  • Hi Tarun,

    I am not an expert -- but I have looked into this before. It is not going to be an easy task to draw casual inference using cross-sectional data like the ACS PUMS. However, it is not impossible -- I think you just have to come up with a really good identification strategy. I think David Dorer posted some really great questions. What is your treatment? What is your outcome? I would also like to add: what is your unit of analysis? Is it  county level, household, or individual? It would be harder to use propensity score matching at the household/individual level because you don't know the outcome for the next period -- since cross-sectional samples are not the same across years. You also have to think about what to do with the survey weights -- are you going to match/analyze them with or without the weights? If your unit of analysis is at the county level or state level, then you have to control for migration and other policies at the state and county level. You have to control for all of these factors to draw casual inference. I hope this helps! Sorry for any typos.

    - T

Reply
  • Hi Tarun,

    I am not an expert -- but I have looked into this before. It is not going to be an easy task to draw casual inference using cross-sectional data like the ACS PUMS. However, it is not impossible -- I think you just have to come up with a really good identification strategy. I think David Dorer posted some really great questions. What is your treatment? What is your outcome? I would also like to add: what is your unit of analysis? Is it  county level, household, or individual? It would be harder to use propensity score matching at the household/individual level because you don't know the outcome for the next period -- since cross-sectional samples are not the same across years. You also have to think about what to do with the survey weights -- are you going to match/analyze them with or without the weights? If your unit of analysis is at the county level or state level, then you have to control for migration and other policies at the state and county level. You have to control for all of these factors to draw casual inference. I hope this helps! Sorry for any typos.

    - T

Children
  • Thanks T. I am planning to use the PUMS individual data. The treatment is income and outcome is broadband. I did not understand your concern about cross-sectional samples. Can you please elaborate? 

    Re Survey weights: since I am using the individual data do I still need to consider survey weights? I was thinking of not using them. Thanks again! 

  • Hi Tarun,

    If you're just looking at a single ACS year (the One-Year ACS PUMS) to draw causal inferences, then it is not easy. It is hard to establish the direction of causality in single year (like you mentioned -- the direction of causality is not establish really well here) -- think of it as a snapshot in time. You see people with high income with broadband in a single ACS year (as an example), that is not a strong enough case to say that high income impacts broadband. However, let's say you have person A and person B with the same level of income in period t1, both with no broadband. But in the next period (t2), you see an increase in income for person A and now he/she has broadband -- then that's a good case for causality. Otherwise, it is hard to establish the direction of causality here if you only use a single year of data. Hence, that was my initial concern in the last post, cross-sectional data means you don't have the same people across sample years to do this type of analysis. That is my understanding -- and again I am no expert in this field -- I just came across the same problem. I think it is okay to not use the survey weights. However, make sure you have enough samples for the exact matching if you're doing it within a small geographical location. Hope this helps! 

    -T

  • Doing multiple time points to track the "over time" relationship between broadband access and income is a much harder problem. I would set that aside for now.  Do a "cross-sectional" analysis for a single time point, for example the 2016-2020 ACS vintage. I would start with logistic regression.  A multivariate regression takes into account any correlation between the "input" "predictor" or "x-variables".   If you want to do your PUMS analysis correctly you need a logistic regression package that handles weights, even replicate weights . You will be able to compute errors in your estimates correctly. You will get "error bars."

    R is an open source free statistical analysis system.  There is a "GUI" point-and-click version called R Studio. The free version should have everything that you need. The add-on package that you need is the "survey" package.  With enough "googling" you should be able to find code (including replicate weights) to solve your problem.  You can probably find an ACS example. When using R it is helpful to have some programming experience.  Any computer language will do.   R-Studio is all point and click and it will write the necessary R code for you. There are other packages that are able to do weights and replicate weights, SAS, Stata, SPSS ?.   If you have access to them then great !  If you don't, you need to get out your wallet.

    https://www.rstudio.com/products/rstudio/download/

    The regular windows GUI version of R has some pull down menus but I've heard good things about R-Studio.

    Regular R for WIndows https://cran.r-project.org/bin/windows/base/   I don't think that the survey package is part of the baseline package.  use install.packages("survey") to install the package.

    Best of luck !

  • As an FYI -- if you read my profile -- I use regular R on my Ubuntu Linux machine -- it's the best !