Comparisons Within the Same Time Period


I have a question related to comparing two estimates from the same period. The comparison involves a large area (a city) and a tract from the city. Given they are not independent samples, how can I can compare the mean of a variable that represents a city to the mean of the same variable of a city's census tract? Please advise. 

  • I assume that you might want a ratio of the "sub" area and the larger area.  For example a county might be the "large "area and the "subset" a single tract.  You might want the ratio of the 2 populations (B01003)

    In general the variance of a ratio  is Mean(R/S) ~=  (mean(R)/mean(S))^2 * [ var(R)/(mean(R)^2)-2*cov(R,S)/(mean(R)*mean(S))+var(S)/(mean(S)^2) ].  

    You can use variance replicate tables to compute the ratio and MoE.  Look here for ACS 2021 5 year tables  You need the geography level.  For example a tract is level 140 see 050 is a county.

    There are 80 replicates.  [This is incorrect but this is roughly what you do.  You can also compute the ratio R/S  80 times and take the variance of the 80 times and take the mean of the 80 values. This would be a bootstrap type calculation ]  

     The actual calculation is The variance of R/S is  Var == 4/80 * [  sum( (Ri/Si)- R/S) ^2 ]   The sum index i) is over the 80 replicates. Ri is the numerator for the i-th replicate  Si is the denominator.

    Margin of Error for R/S = 1.645 x square_root (Var)

    In words:

    1a. Calculate 80 Differences
    1b. Square each difference
    1c. Sum all of the squared differences
    1d. Multiply the sum by 4/80
    2. Take the square root to find the standard error (SE)
    3. Multiply the SE by 1.645 to obtain the margin of error (MOE)

     (look on line) .

    Hope this helps.


  • Thanks for sharing these resources.

    My question is probably "simpler". I would like to estimate if the means are statistically different given the city mean vs. the census tract mean. For instance median household income: B19013

    CS = census tract; SE= standard error

    |(Mean of CS - Mean of the City)/sqrt[(SE_cs)^2 + (SE_city)^2]| , the results from this estimation compare it to a z score scale 

  • The "samples" are not independent since the census tract an is part of the city.  You need to take into account of the covariance between the city and tract medians.

    The formulas using Variance Replicate tables works with medians. How are you comparing median incomes ?  Are you taking the difference or the ratio of medians?  Look at the slide deck link for details.

  • the difference of medians. is there a r script to pull variance replicate tables? I need to follow this process for many variables and was wondering if there is a way to automatize this estimation. i see variance replicate tables can not be retrieved using the census api, any suggestions are welcome. thanks

  • The variance replicate table in in separate files for each ACS table.  A single file has the entire US.

    There is a directory for each geography. Here is an example

    Since you will only need a couple of files (one for cities and one for tracts), you can do the download by hand.  You can unzip the files (each one has a single csv file inside) or read directly from the zip file using the unz connection and read.table. geo level codes  160 place   60 county subdivision 140 tract 50 county


  • Comparing the mean of a variable between a larger entity (city) and a smaller subset within it (census tract) can be tricky due to potential dependencies and differences in scale. Here are a few steps you can consider to approach this comparison:

    1. Understanding Dependencies: Before making any comparison, it's crucial to understand the nature of the relationship between the city and its census tract. Are there specific factors that might make the tract's data dependent on the city's data? Understanding these dependencies will guide your approach.

    2. Scale Adjustment: Since you're comparing data from a larger area (city) to a smaller subset (census tract), you need to account for the differences in scale. A direct comparison of means might not be meaningful due to the inherent variations in larger populations. You might consider per capita or per unit area calculations to make the comparison more reasonable.

    3. Statistical Testing: If you have reasons to believe that the tract's data is dependent on the city's data (for example, if they are geographically adjacent), you could use statistical tests that account for dependencies, such as hierarchical linear models (HLM) or mixed-effects models. These models can help you understand if there's a statistically significant difference between the city's mean and the tract's mean, while accounting for the hierarchical structure.

    4. Sampling Strategy: If the tract is a subset of the city, you need to ensure that the sampling strategy for both the city and the tract is appropriate and representative. Biases in sampling could significantly affect the validity of your comparison.

    5. Contextual Factors: Consider other relevant contextual factors that might influence the variable you're comparing. These factors could include demographic differences, economic conditions, urban planning, and so on. Adjusting for these factors can provide a more accurate comparison.

    6. Visualization: Visualizations can help you understand the data better. Box plots, histograms, or density plots can show the distribution of the variable in both the city and the tract, helping you identify any major differences or patterns.

    7. Effect Size: Along with statistical significance, it's important to assess the practical significance or effect size of the difference. Even if you find a statistically significant difference, it might not be practically meaningful.

    8. Sensitivity Analysis: Perform sensitivity analyses to understand how changes in assumptions, models, or parameters could affect your results. This can give you a better sense of the robustness of your findings.

    9. Expert Consultation: Depending on the complexity of your data and analysis, consulting with domain experts or statisticians might be valuable to ensure that your approach is sound and valid.

    In summary, comparing the means of a variable between a city and one of its census tracts requires careful consideration of dependencies, sampling, scale, and statistical methods. It's important to approach this comparison with a nuanced understanding of the data and its context.