The U.S. Census Bureau's chief is defending a new tool meant to protect the privacy of people participating in the statistical agency's questionnaires against calls to abandon it by prominent researchers who claim it jeopardizes the usefulness of numbers that are the foundation of the nation’s data infrastructure.
The tool known as differential privacy “was selected as the best solution available" against efforts by outside groups or individuals to piece together the identities of participants in the bureau's censuses and surveys by using third-party data and powerful computers, U.S. Census Bureau Director Robert Santos said in a letter last week. Concerns about privacy have grown in recent years as cyberattacks and threats of personal data being used for the wrong reasons have become more commonplace.
Several prominent state demographers and academic researchers had asked the statistical agency in August to abandon using differential privacy on future annual population estimates, which are used in the distribution of $1.5 trillion in federal funding each year, and future releases of American Community Survey data, which provide the most comprehensive information on how people live in the U.S.
The demographers and researchers said the application of the privacy method for the first time on 2020 census data had delayed their release and created inaccuracies in the numbers used to determine political power and distribute federal funds. The researchers said in their letter that there were thousands of small jurisdictions throughout the U.S. that won’t get usable data because of the algorithms applied to the numbers to protect confidentiality.
By continuing to use the differential privacy algorithms, “the Census Bureau risks failing its responsibilities as a federal statistical agency to provide relevant, accurate, timely, and credible information for the public good," the researchers and demographers said. “In fact, the experience of the last few years has undermined user trust in the Census Bureau."
Differential privacy algorithms add intentional errors to data to obscure the identity of any given participant and is most noticeable at the smallest geographies, such as census blocks. Data used for determining how many congressional seats each state gets and for redrawing political districts were released last year, but more detailed figures from the 2020 census won't be made public until next year, almost three years after they were collected.
Some bias using the privacy tool “was inevitable from a purely mathematical perspective," but bureau statisticians have worked to minimize it, and delays were caused by the pandemic, which pushed back a series of releases of the 2020 census data, Santos said.
Meanwhile, the bureau's watchdog agency said in a report last week that the statistical agency had failed to stop simulated cyberattacks it had conducted as part of a covert operation to test the bureau's cybersecurity vulnerabilities. The U.S. Department of Commerce's Office of Inspector General said that its team had obtained unauthorized access to a domain administrator account, gotten personally identifiable information about bureau employees and used insecure programs to send out fake emails.
The Census Bureau said in a response to the report that the exercise had allowed it to improve its cyber defenses.
Follow Mike Schneider on Twitter at https://twitter.com/MikeSchneiderAP.