Hello, I am a 1st year demography graduate student. I do not have a strong background in statistics, but I am doing my best to learn.
I am looking for a method of comparing estimates across years for statistical significance for a statistic derived from a PUMA variable. I am using PUMS data to estimate the percent of cost burdened renters in a small set of PUMAs. "Cost burdened" is defined as paying 30% or more of income on rent.
I estimated percent cost burdened by downloading and cleaning the data in R.
- I downloaded the applicable variables for my PUMAs for the years 2014 and 2019.
- I filtered the data so SPORDER = 1 and TYPE = 1 to have one observation per household.
- I filtered the data so TEN == 3: renters (eliminating 1: owned with mortgage; 2: owned free and clear, and 4: occupied without payment of rent)
- Using GRPIP, I created a dummy variable for each observation coded as "Cost Burdened" and "Not Cost Burdened" (I'm just now realizing I should code this as 1 and 0 rather than character strings).
I now have a data frame with the variables PUMA, Cost Burdened, and YEAR. I want to compare the percent cost burdened in each PUMA in 2014 and 2019 and determine 1) the magnitude of the change and 2) wether the change is statistically significant. Does anyone have advice or an example of how to estimate this change across years?
Thank you!
Brian S