Top-coded values

Has anyone ever attempted to "fill-in" the top-coded values for the 5 year ACS PUMS data? For example, in 2009, all of the house values (variable: valp) greater than 4 million dollars in Hawaii get cutoff and it's impossible to tell if the observation you're looking at is a 50 million dollar home or a 4.01 million dollar home - since both look the same. The top-coded cutoff values are different for every year by state combination and I haven't had much luck trying to break through the top-coded value. I've mostly been running linear regressions on the data and am just trying to get a reasonable estimate for each observation that is top-coded. Any insight would be greatly appreciated.

[Updated on 2/24/2015 3:08 PM]
  • You need a secondary source of data to distribute the top-coded data. Your results will vary according to how large (and representative) the secondary dataset is. I used to use proprietary rental listing databases to further break out rental units in the top category according to bedroom size and square footage with some degree of success. You might see if you can get some historical MLS databases (or something similar) and try to create some distributions based on price per square foot or some other parameters.
  • You need a secondary source of data to distribute the top-coded data. Your results will vary according to how large (and representative) the secondary dataset is. I used to use proprietary rental listing databases to further break out rental units in the top category according to bedroom size and square footage with some degree of success. You might see if you can get some historical MLS databases (or something similar) and try to create some distributions based on price per square foot or some other parameters.
No Data