De-identify ZCTA based data

I'm trying to merge zip codes so the the combined area has a population of 20,000 or more.  I'm trying to de-identify a data set that has zip codes by merging adjacent ZCTAs.

If you have any ideas let me know.   Perhaps someone has already done this and they have a list.

Parents
  • Interesting David. 

    There are some existing tools that combine adjacent geographic areas to meet various objective functions (minimum population, minimum margin of error, etc.).  Some try to combine areas that are homogeneous by some attribute (demographics, income, etc.).  One such Python-based tool is described in a PLOS paper from a few years ago:
    Reducing Uncertainty in the American Community Survey through Data-Driven Regionalization | PLOS ONE

  • Thanks Dave very helpful.  I'm using R and I can combine polygons that touch a central/target polygon based on the population for that polygon or based on the smallest distance between the centroid  of the central polygon and the centroids of the touching polygons.   I'm using R so it would be nice if there is an R program/package.  Right now I'm using loops in R which is slow.  A C program would be nice to speed things up. I'm familiar with calling C and Fortran from R.  I think that I called python from R at some point but it requires a big effort to get things set up.

    Dave

    PS when I get this working, I'll make a dataset with rows giving the ZCTAs for each "condensed/combined" ZCTA group.

    Maybe someone will be able to use it to de-identify a dataset with zipcodes/ZCTAs

Reply
  • Thanks Dave very helpful.  I'm using R and I can combine polygons that touch a central/target polygon based on the population for that polygon or based on the smallest distance between the centroid  of the central polygon and the centroids of the touching polygons.   I'm using R so it would be nice if there is an R program/package.  Right now I'm using loops in R which is slow.  A C program would be nice to speed things up. I'm familiar with calling C and Fortran from R.  I think that I called python from R at some point but it requires a big effort to get things set up.

    Dave

    PS when I get this working, I'll make a dataset with rows giving the ZCTAs for each "condensed/combined" ZCTA group.

    Maybe someone will be able to use it to de-identify a dataset with zipcodes/ZCTAs

Children