I doubt this. The Census Bureau actively encourages third party data distributors to use their data (I should know; I worked at one for ten years. And some other federal agencies make their data much more…
Here is the original twitter thread, which is public: twitter.com/.../1395774558096039938
I do not yet see a URL on the Data User Group website that provides access to the recordings. You could try this URL:
This is a Tweet from Steven Ruggles.
Steve and Andy, I am a member of the ACS Data Users Group, so I received this post. (I am famously not on Twitter.) I wanted to confirm that it is okay to share Steve's concerns more widely, especially among folks like myself working in the policy arena. Thanks!
Much appreciated - thanks, Bernie.
Where can I watch this presentation/ where are the conference videos posted? (I did attend the conference, but had a conflict during that hour.)
You will be asked to sign in and will be send a verification code. Hopefully this will work for you.
And if the link does work, navigate to the Thursday 1PM talk. The first speaker, Rolando Rodriguez, covered this topic. The link to the session page itself is this:
thank you Cliff!
Thank you Andy for bringing it to our attention. It sounds like such a suicidal plan from the Census Bureau! It is absurd that the Census Bureau itself is trying to diminish the value the real data that they use tax payer's dollar to collect. Real survey data cannot be replaced with synthetic data, just as why a human being cannot be replaced with AI or a robot. There are vitality in real survey data that makes new discoveries possible. There are relationships, mechanisms in a real human community that have not been fully understood or even noticed, therefore not everything can be modelled and recreated with the synthetic data.
It reminds me of an ancient story: a guy refused to eat and was going to starve himself to death because he feared the small chance of being choked.
This may be politically motivated. I discuss why I believe that at www.quora.com/.../John-Grumbine-1
I doubt this. The Census Bureau actively encourages third party data distributors to use their data (I should know; I worked at one for ten years. And some other federal agencies make their data much more difficult for third parties to use.). The more people using Census data, from whatever platform, the higher the importance of the Census Bureau (and, ideally, the higher their funding). As important as Census dissemination and analytical tools are, they don't compare to the great importance of Census data products, and they know this.There are a lot of reasons to be skeptical about this move to synthetic microdata (as demonstrated by Steven Ruggles's thread), and it's always important to consider the motivations of those involved, but I don't think this is it.
I was surprised to see this topic come up in the City Observatory newsletter this week so I am copying their comment below. (City Observatory is newsletter that looks at urban planning issues through an economic lens.):
Synthetic microdata: A threat to knowledge. Each week at City Observatory, we usually profile an interesting or provocative research study. This week, we're spending a minute to highlight a potential threat to a key source of data that helps us better understand our world, and especially the nation's cities: the public use microsample of the American Community Survey (ACS). The ACS is the nation's largest and most valuable source of data on population, housing, social and economic characteristics. While the Census Bureau produces many tabulations of these data, its impossible to slice and dice data in a way that bears on every question. So Census Bureau makes available what is called a "public use microsample" which allows researchers to craft their own customized tabulations of these data to answer specific questions. At City Observatory, for example, we've used these data to estimate the income, race and ethnicity of peak hour drive alone suburban commuters traveling from suburban Washington State to jobs in Oregon--a question that would be essentially impossible to answer from either published Census tabulations or other publicly available data.
Microdata are valuable because they link answers to different ACS questions--linking a persons age, gender or race to their income, occupation or housing type, and on. But because the microdata are individual survey responses, some are concerned that there's a potential violation of privacy: that someone could use answers to a series of questions to deduce the identity of an individual survey respondent. While that may technically be a possibility, there's no evidence it occurs in practice. Still, Census Bureau is hypersensitive about privacy concerns, and has proposed replacing actual microdata with "synthetic" microdata, in order to make it even more difficult to identify an individual. Essentially, synthetic data would replace actual patterns of responses with statistically modeled responses. The trouble is, this modeled, synthetic data actually subtracts information, and makes it impossible for researchers to know whether the answers to any particular question are a product of actual variation, or just a quirk of Census Bureau's model. As University of Minnesota data expert Stephen Ruggles puts it, "synthetic data will be useless for research."
The privacy threat from ACS microdata is a phantom menace. Ruggles and a colleague at the University of Minnesota have just published a paper showing that attempting to use Census microdata to create individually identifiable records via database reconstruction would produce vastly more random (i.e. false) matches that real ones. This undercuts the idea that microdata is an actual threat to privacy.
But a proposal to replace PUMS data with synthetic data is a real threat to our ability to better understand our world. It is like requiring piano players to wear mittens when playing Beethoven sonatas: the piano will still produce sound, but the result will be noise, not music.
Mike Schneider, Census Bureau's use of 'synthetic data' worries researchers, Some researchers are up in arms about a U.S. Census Bureau proposal to add privacy protections by manipulating numbers in the data most widely used for economic and demographic research, ABC News, May 27, 2021
Steven Ruggles and David VAn Piper, "The Role of Chance in the Census Bureau Database Reconstruction Experiment," University of Minnesota, May 2021 Working Paper No. 2021-01 DOI: https://doi.org/10.18128/MPC2021-01