I have a question related to comparing two estimates from the same period. The comparison involves a large area (a city) and a tract from the city. Given they are not independent samples, how can I can compare the mean of a variable that represents a city to the mean of the same variable of a city's census tract? Please advise.
I assume that you might want a ratio of the "sub" area and the larger area. For example a county might be the "large "area and the "subset" a single tract. You might want the ratio of the 2 populations (B01003)
In general the variance of a ratio is Mean(R/S) ~= (mean(R)/mean(S))^2 * [ var(R)/(mean(R)^2)-2*cov(R,S)/(mean(R)*mean(S))+var(S)/(mean(S)^2) ].
You can use variance replicate tables https://www.census.gov/programs-surveys/acs/data/variance-tables.html to compute the ratio and MoE. Look here for ACS 2021 5 year tables https://www2.census.gov/programs-surveys/acs/replicate_estimates/2021/data/5-year/ You need the geography level. For example a tract is level 140 see https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file/carto-boundary-summary-level.html 050 is a county.
There are 80 replicates. [This is incorrect but this is roughly what you do. You can also compute the ratio R/S 80 times and take the variance of the 80 times and take the mean of the 80 values. This would be a bootstrap type calculation ]
The actual calculation is The variance of R/S is Var == 4/80 * [ sum( (Ri/Si)- R/S) ^2 ] The sum index i) is over the 80 replicates. Ri is the numerator for the i-th replicate Si is the denominator.
Margin of Error for R/S = 1.645 x square_root (Var)
1a. Calculate 80 Differences1b. Square each difference1c. Sum all of the squared differences1d. Multiply the sum by 4/802. Take the square root to find the standard error (SE)3. Multiply the SE by 1.645 to obtain the margin of error (MOE)
(look on line) . https://www.census.gov/content/dam/Census/programs-surveys/acs/news/Events/Calculating%20MOEs%20the%20ACS%20Way.pdf
Hope this helps.
Thanks for sharing these resources.
My question is probably "simpler". I would like to estimate if the means are statistically different given the city mean vs. the census tract mean. For instance median household income: B19013
CS = census tract; SE= standard error
|(Mean of CS - Mean of the City)/sqrt[(SE_cs)^2 + (SE_city)^2]| , the results from this estimation compare it to a z score scale
The "samples" are not independent since the census tract an is part of the city. You need to take into account of the covariance between the city and tract medians.
The formulas using Variance Replicate tables works with medians. How are you comparing median incomes ? Are you taking the difference or the ratio of medians? Look at the slide deck link for details.
the difference of medians. is there a r script to pull variance replicate tables? I need to follow this process for many variables and was wondering if there is a way to automatize this estimation. i see variance replicate tables can not be retrieved using the census api, any suggestions are welcome. thanks
The variance replicate table in in separate files for each ACS table. A single file has the entire US.
There is a directory for each geography. Here is an example
Since you will only need a couple of files (one for cities and one for tracts), you can do the download by hand. You can unzip the files (each one has a single csv file inside) or read directly from the zip file using the unz connection and read.table. geo level codes 160 place 60 county subdivision 140 tract 50 county