The U.S. Census Bureau has a long history of protecting decennial census responses from being used to identify individual respondents. Prior to the 2020 Census, the Bureau employed a variety of ad-hoc methods such as data swapping, data imputing, rounding, and top-coding. However, the Census Bureau determined that data-protection methods used in prior Censuses would no longer suffice to meet statutory confidentiality requirements.
For the 2020 Census, the Bureau will implement differential privacy (DP). DP is a mathematical technique that provides for the formal quantification of the risk of data disclosure. It protects privacy by infusing “noise” derived from specific statistical distributions into the data. DP provides for a “privacy-accuracy” trade-off depending on the “privacy-loss budget” chosen for the mechanism. Because the privacy-loss budget must be distributed over the total number of queries made against a data set, it can be quite small for an individual query, leading to a loss in the accuracy of the statistic/table relative to the level of privacy protection afforded by the mechanism. Additionally the Census Bureau held populations at the nation, state, and state-level areas along with total housing units and number of group-quarters facilities by type at the Census block level “invariant” – unchanged from the published counts.
In 2019 and 2020, the Census Bureau released a series of tables and Privacy-Protected Microdata Files (PPMF) files based on the 2010 Census that were designed to demonstrate how DP would impact the release of statistics and provide users with a way to compare DP-protected data with actual data releases. Users and analysts noted significant distortions in counts and distributions for smaller geographic areas and for attributes such as race/ethnicity relative to original data.
In April 2021 the Census Bureau released two sets of PPMFs (one with a global epsilon of 12.2 and one with a global epsilon of 4.5) that incorporated updates and revisions to the Disclosure Avoidance System (DAS). The Census Bureau established accuracy targets designed to reflect redistricting use cases so that the largest racial or ethnic group in any geographic entity with a population of 500 or more persons is accurate within 5 percentage points of their enumerated value at least 95% of the time. The privacy loss budget was increased to be more in line with anticipated production levels and specifically tuned across all queries and geographies to meet the accuracy targets. The Census Bureau revised the geographic spine by establishing hybrid “Optimized Block Groups” that bring traditionally off-spine areas such as places and Minor Civil Divisions closer to the geographic hierarchy. Finally, the DAS incorporates a revised noise generation mechanism that samples from a discrete gaussian distribution whose variance is determined by the privacy loss budget for each query and geographic level.
On June 8, 2021, the Census Bureau announced the production privacy-loss budget for the 2020 Census redistricting data would be 19.61, with 17.14 allocated to person tables and 2.47 to housing tables. The budget will be primarily allocated to the total population and race by ethnicity queries at the block group level and above.Timeline for 2020 Census product releases:
- Released
- Apportionment File – April 26, 2021
- Redistricting File – August 12, 2021 (FTP)/September 16, 2021 (data.census.gov)
- Planned Future Releases
- Demographic Profile/DHC – May 2023
- Detailed DHC-A (Total population and sex by age by detailed race/ethnicity) – August 2023
- Detailed DHC-B (Household and tenure by detailed race/ethnicity) – TBD
- Supplemental DHC (S-DHC – People in households) – TBD
More Information
Census Bureau
- Disclosure Avoidance Webinar Series – U.S. Census Bureau. A webinar series designed to help data users better understand the Census Bureau’s plans to apply modernized noise infusion algorithms to protect 2020 Census statistics from disclosure.
- Developing the DAS – U.S. Census Bureau.
- Disclosure Avoidance and the 2020 Census – U.S. Census Bureau.
Independent Analysis
- Changes To Census Bureau Data Products – IPUMS. Includes a discussion of the potential application of DP to the American Community Survey.
- What is Differential Privacy and Could it Affect Redistricting? – Princeton Election Consortium – June 2, 2021. The post discusses the Consortium’s research on differential privacy and it’s potential impact on redistricting.
- Differential Privacy Heats Up Census Voting Debate – National Council of State Legislatures – NCSL Blog – May 24, 2021. The blog post contains a list of research conducted on the April 28 PPMF release.
- Differential Privacy for Census Data Explained – National Council of State Legislatures – updated May 21, 2021.
- National Congress of American Indians (NCAI) Policy Research Center – NCAI – May 17, 2021. Analysis of DAS impact to American Indian/Alaskan Native Tribal Nations.
- National Academies – 2020 Census Data Products Data Needs and Privacy Considerations Workshop – National Academies of Sciences, Engineering, and Medicine – Committee on National Statistics, December 2019.
- Differential Privacy and Census Data – an overview of DP as initially implemented by the Census Bureau.
- Discrete Additive Noise Mechanisms for Differential Privacy: Geometric and Gaussian Additive Noise – a more technical discussion of alternative additive noise mechanisms used in DP.
- Differential Privacy and Census Data: 2020 Census Demographics and Housing Characteristics File – an overview of the DP mechanism proposed to be applied to the 2020 Census Demographic and Housing Characteristics file.
- Demonstration Data for California – 8-25-2022 Release (all geographies):
Visualizations
April 2021 DAS Demonstration Products Comparison Maps (Epsilon 12.2):
- DP impact on California 113th Congressional districts by race/ethnicity for persons age 18 years and over.
- DP impact on California State Senate districts by race/ethnicity for persons age 18 years and over.
- DP impact on California Assembly districts by race/ethnicity for persons age 18 years and over.
April 2021 DAS Demonstration Products Comparison Maps (Epsilon 4.5):
- DP impact on California 113th Congressional districts by race/ethnicity for persons age 18 years and over.
- DP impact on California State Senate districts by race/ethnicity for persons age 18 years and over.
- DP impact on California Assembly districts by race/ethnicity for persons age 18 years and over.
November 2020 DAS Demonstration Products Comparison Maps:
- DP impact on California 113th Congressional districts by race/ethnicity for persons age 18 years and over.
- DP impact on California State Senate districts by race/ethnicity for persons age 18 years and over.
- DP impact on California Assembly districts by race/ethnicity for persons age 18 years and over.
Comments, suggestions, or questions? Please email CensusData4CA@dof.ca.gov.