I'm trying to merge zip codes so the the combined area has a population of 20,000 or more. I'm trying to de-identify a data set that has zip codes by merging adjacent ZCTAs.
If you have any ideas let me know. Perhaps someone has already done this and they have a list.
There are some existing tools that combine adjacent geographic areas to meet various objective functions (minimum population, minimum margin of error, etc.). Some try to combine areas…
There are some existing tools that combine adjacent geographic areas to meet various objective functions (minimum population, minimum margin of error, etc.). Some try to combine areas that are homogeneous by some attribute (demographics, income, etc.). One such Python-based tool is described in a PLOS paper from a few years ago:Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization | PLOS ONE
Thank You for the article link Dave S. It is quite interesting, not only in succinctly describing fairly well known aspects of improving data quality, but also presenting considerations which are less instinctive.
I noticed the R package but I haven't had a chance to look into it yet. Thanks for letting me know about it. I just got my R program working and it seems to be serviceable. I ran it on all the Massachusetts zip codes. There are 540 of them. I aggregated the ZCTAs so that the groups have a population of 10,000 or more. It took maybe 3 or 4 minutes. I merged the polygons that touch a selected polygon. If there are multiple touching polygons I take the one with the smallest distance between centroids. I haven't tried a selection criterion based on the populations of the touching polygons. Merging on distance gives a range for the populations of 10,000 to about 219,000 for Massachusetts. Merging using populations might lower the upper limit on the resulting populations. I'll have to try it.