De-identify ZCTA based data

I'm trying to merge zip codes so the the combined area has a population of 20,000 or more.  I'm trying to de-identify a data set that has zip codes by merging adjacent ZCTAs.

If you have any ideas let me know.   Perhaps someone has already done this and they have a list.

Parents
  • I've worked on this de-identify project a little more.  I've condensed/combine US zip codes (ZCTAs) into 6184 groups. There are approximately 32923 individual ZCTAs.  Each group has a population of 20,000 or more.  I have R code to do this using the 2020 (2022) TIGERLINE ZCTA shape file. https://www2.census.gov/geo/tiger/TIGER2022/ZCTA520/tl_2022_us_zcta520.zip This allows you to take a file that has "protected/individually identifiable" information (a numeric count vector) and zip codes (ZTCAs) and  combine (sum) the values information vector over each ZCTA group.  The resulting 6184 values are de-identified after you group combined vector values of 1-5 as 5.

      If you read the HIPAA guidance for de-identifying Protected Health Information (PHI)  https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/De-identification/hhs_deid_guidance.pdf you can see the details. They recommend using the first 3 digits of the zip coded as a geocode when de-identifying a data set with zipcodes.  By using the grouped/combined  zip code areas above you get areas with populations of 20,000+ as per the HIPAA guidance. This gives substantially smaller geographic areas than the "first 3 digits of the zip code" areas. I'll clean up my R program and upload it.

    If you have a service program for an organization with an address list, using this method you can take the zip codes from the address file and make a (heat) map with binned counts of participants in each of the ZCTA group geographies.  The resulting map will be de-identified and can be distributed to the public.  You can also use the exact address and the geocoder https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form to get the census tract.  The census tracts can also be grouped in a similar way.  A different minimum size like 10,000 could be chosen.

    Dave

Reply
  • I've worked on this de-identify project a little more.  I've condensed/combine US zip codes (ZCTAs) into 6184 groups. There are approximately 32923 individual ZCTAs.  Each group has a population of 20,000 or more.  I have R code to do this using the 2020 (2022) TIGERLINE ZCTA shape file. https://www2.census.gov/geo/tiger/TIGER2022/ZCTA520/tl_2022_us_zcta520.zip This allows you to take a file that has "protected/individually identifiable" information (a numeric count vector) and zip codes (ZTCAs) and  combine (sum) the values information vector over each ZCTA group.  The resulting 6184 values are de-identified after you group combined vector values of 1-5 as 5.

      If you read the HIPAA guidance for de-identifying Protected Health Information (PHI)  https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/understanding/coveredentities/De-identification/hhs_deid_guidance.pdf you can see the details. They recommend using the first 3 digits of the zip coded as a geocode when de-identifying a data set with zipcodes.  By using the grouped/combined  zip code areas above you get areas with populations of 20,000+ as per the HIPAA guidance. This gives substantially smaller geographic areas than the "first 3 digits of the zip code" areas. I'll clean up my R program and upload it.

    If you have a service program for an organization with an address list, using this method you can take the zip codes from the address file and make a (heat) map with binned counts of participants in each of the ZCTA group geographies.  The resulting map will be de-identified and can be distributed to the public.  You can also use the exact address and the geocoder https://geocoding.geo.census.gov/geocoder/geographies/addressbatch?form to get the census tract.  The census tracts can also be grouped in a similar way.  A different minimum size like 10,000 could be chosen.

    Dave

Children
No Data