Disclosure Avoidance Through Differential Privacy

The U.S. Census Bureau has a long history of protecting decennial census responses from being used to identify individual respondents. Prior to the 2020 Census, the Bureau employed a variety of ad-hoc methods such as data swapping, data imputing, rounding, and top-coding. However, the Census Bureau determined that data-protection methods used in prior Censuses would no longer suffice to meet statutory confidentiality requirements.

For the 2020 Census, the Bureau will implement differential privacy (DP). DP is a mathematical technique that provides for the formal quantification of the risk of data disclosure. It protects privacy by infusing “noise” derived from specific statistical distributions into the data. DP provides for a “privacy-accuracy” trade-off depending on the “privacy-loss budget” chosen for the mechanism. Because the privacy-loss budget must be distributed over the total number of queries made against a data set, it can be quite small for an individual query, leading to a loss in the accuracy of the statistic/table relative to the level of privacy protection afforded by the mechanism. Additionally the Census Bureau held populations at the nation, state, and state-level areas along with total housing units and number of group-quarters facilities by type at the Census block level “invariant” – unchanged from the published counts.

In 2019 and 2020, the Census Bureau released a series of tables and Privacy-Protected Microdata Files (PPMF) files based on the 2010 Census that were designed to demonstrate how DP would impact the release of statistics and provide users with a way to compare DP-protected data with actual data releases. Users and analysts noted significant distortions in counts and distributions for smaller geographic areas and for attributes such as race/ethnicity relative to original data.

In April 2021 the Census Bureau released two sets of PPMFs (one with a global epsilon of 12.2 and one with a global epsilon of 4.5) that incorporated updates and revisions to the Disclosure Avoidance System (DAS). The Census Bureau established accuracy targets designed to reflect redistricting use cases so that the largest racial or ethnic group in any geographic entity with a population of 500 or more persons is accurate within 5 percentage points of their enumerated value at least 95% of the time. The privacy loss budget was increased to be more in line with anticipated production levels and specifically tuned across all queries and geographies to meet the accuracy targets. The Census Bureau revised the geographic spine by establishing hybrid “Optimized Block Groups” that bring traditionally off-spine areas such as places and Minor Civil Divisions closer to the geographic hierarchy. Finally, the DAS incorporates a revised noise generation mechanism that samples from a discrete gaussian distribution whose variance is determined by the privacy loss budget for each query and geographic level.

On June 8, 2021, the Census Bureau announced the production privacy-loss budget for the 2020 Census redistricting data would be 19.61, with 17.14 allocated to person tables and 2.47 to housing tables. The budget will be primarily allocated to the total population and race by ethnicity queries at the block group level and above.Timeline for 2020 Census product releases:

  • Released
    • Apportionment File – April 26, 2021
    • Redistricting File – August 12, 2021 (FTP)/September 16, 2021 (data.census.gov)
  • Planned Future Releases
    • Demographic Profile/DHC – May 2023
    • Detailed DHC-A (Total population and sex by age by detailed race/ethnicity) – August 2023
    • Detailed DHC-B (Household and tenure by detailed race/ethnicity) – TBD
    • Supplemental DHC (S-DHC – People in households) – TBD

More Information

Census Bureau

Independent Analysis

Visualizations

April 2021 DAS Demonstration Products Comparison Maps (Epsilon 12.2):

April 2021 DAS Demonstration Products Comparison Maps (Epsilon 4.5):

November 2020 DAS Demonstration Products Comparison Maps:

Comments, suggestions, or questions? Please email CensusData4CA@dof.ca.gov.