I'm trying to merge zip codes so the the combined area has a population of 20,000 or more. I'm trying to de-identify a data set that has zip codes by merging adjacent ZCTAs.
If you have any ideas let me know. Perhaps someone has already done this and they have a list.
Interesting David.
There are some existing tools that combine adjacent geographic areas to meet various objective functions (minimum population, minimum margin of error, etc.). Some try to combine areas…
I've worked on this de-identify project a little more. I've condensed/combine US zip codes (ZCTAs) into 6184 groups. There are approximately 32923 individual ZCTAs. Each group has a population of 20,000 or more. I have R code to do this using the 2020 (2022) TIGERLINE ZCTA shape file. https://www2.census.gov/geo/tiger/TIGER2022/ZCTA520/tl_2022_us_zcta520.zip This allows you to take a file that has "protected/individually identifiable" information (a numeric count vector) and zip codes (ZTCAs) and combine (sum) the values information vector over each ZCTA group. The resulting 6184 values are de-identified after you group combined vector values of 1-5 as 5.
If you read the HIPAA guidance for de-identifying Protected Health Information (PHI) https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/De-identification/hhs_deid_guidance.pdf you can see the details. They recommend using the first 3 digits of the zip coded as a geocode when de-identifying a data set with zipcodes. By using the grouped/combined zip code areas above you get areas with populations of 20,000+ as per the HIPAA guidance. This gives substantially smaller geographic areas than the "first 3 digits of the zip code" areas. I'll clean up my R program and upload it.
If you have a service program for an organization with an address list, using this method you can take the zip codes from the address file and make a (heat) map with binned counts of participants in each of the ZCTA group geographies. The resulting map will be de-identified and can be distributed to the public. You can also use the exact address and the geocoder https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form to get the census tract. The census tracts can also be grouped in a similar way. A different minimum size like 10,000 could be chosen.
Dave