Disclosure Avoidance Through Differential Privacy

The U.S. Census Bureau has a long history of protecting decennial census responses from being used to identify individual respondents. Prior to the 2020 Census, the Bureau employed a variety of ad-hoc methods such as data swapping, data imputing, rounding, and top-coding. However, the Census Bureau determined that data-protection methods used in prior Censuses would no longer suffice to meet statutory confidentiality requirements.

For the 2020 Census, the Bureau will implement differential privacy (DP). DP is a mathematical technique that provides for the formal quantification of the risk of data disclosure. It protects privacy by infusing “noise” derived from specific statistical distributions into the data. DP provides for a “privacy-accuracy” trade-off depending on the “privacy-loss budget” chosen for the mechanism. Because the privacy-loss budget must be distributed over the total number of queries made against a data set, it can be quite small for an individual query, leading to a loss in the accuracy of the statistic/table relative to the level of privacy protection afforded by the mechanism. Additionally the Census Bureau held populations at the nation, state, and state-level areas along with total housing units and number of group-quarters facilities by type at the Census block level “invariant” – unchanged from the published counts.

In 2019 and 2020, the Census Bureau released a series of tables and Privacy-Protected Microdata Files (PPMF) files based on the 2010 Census that were designed to demonstrate how DP would impact the release of statistics and provide users with a way to compare DP-protected data with actual data releases. Users and analysts noted significant distortions in counts and distributions for smaller geographic areas and for attributes such as race/ethnicity relative to original data.

In April 2021 the Census Bureau released two sets of PPMFs (one with a global epsilon of 12.2 and one with a global epsilon of 4.5) that incorporated updates and revisions to the Disclosure Avoidance System (DAS). The Census Bureau established accuracy targets designed to reflect redistricting use cases so that the largest racial or ethnic group in any geographic entity with a population of 500 or more persons is accurate within 5 percentage points of their enumerated value at least 95% of the time. The privacy loss budget was increased to be more in line with anticipated production levels and specifically tuned across all queries and geographies to meet the accuracy targets. The Census Bureau revised the geographic spine by establishing hybrid “Optimized Block Groups” that bring traditionally off-spine areas such as places and Minor Civil Divisions closer to the geographic hierarchy. Finally, the DAS incorporates a revised noise generation mechanism that samples from a discrete gaussian distribution whose variance is determined by the privacy loss budget for each query and geographic level.

On June 8 the Census Bureau announced the production privacy-loss budget for the 2020 Census redistricting data would be 19.61, with 17.14 allocated to person tables and 2.47 to housing tables. The budget will be primarily allocated to the total population and race by ethnicity queries at the block group level and above.

The Census Bureau recently updated the dates for the redistricting (P.L. 94-171) differential privacy implementation:

  • April 28 – new PPMFs and Detailed Summary Metrics released.
  • By May 28: Data users submit feedback to the Census Bureau.
  • June 8: The Census Bureau’s Data Stewardship Executive Policy (DSEP) Committee made the final determination of PLB and system parameters based on data user feedback for P.L. 94-171.
  • Late June: Final DAS production run and quality control analysis begins for P.L. 94-171 data.
  • August 12: Release 2020 Census P.L. 94-171 data as Legacy Format Summary File. Released via Census Bureau FTP site.
  • August: Census Bureau releases PPMFs and Detailed Summary Metrics from applying the production version of the DAS to the 2010 Census data.
  • September: Census Bureau releases production code base for P.L. 94-171 redistricting summary data file and related technical papers.
  • By September 30: Release 2020 Census P.L. 94-171 data (released via data.census.gov) and Differential Privacy Handbook.

More Information

Census Bureau

Independent Analysis

Visualizations

April 2021 DAS Demonstration Products Comparison Maps (Epsilon 12.2):

April 2021 DAS Demonstration Products Comparison Maps (Epsilon 4.5):

November 2020 DAS Demonstration Products Comparison Maps:

Comments, suggestions, or questions? Please email CensusData4CA@dof.ca.gov.