CalHHS Data Knowledge Base
CalHHS Open Data PortalCalHHS Geoportal
  • Data Knowledge Base
  • Data Sharing
    • Revision History
    • Data Sharing Guidebook
    • Lessons Learned
    • Data Sharing Plays
      • Play 1: Sharing Metrics
      • Play 2: Identify
      • Play 3: Business Case
      • Play 4: Prioritize
      • Play 5: Metadata
      • Play 6: Describe
      • Play 7: Promote
      • Play 8: Prepare
    • Data Element Definitions
    • Application Program Interfaces
    • Additional Training and Reference Materials
    • Business Case Creation
      • Determining Goals and Strategy
      • Implementation Details
      • Evaluating Outcomes & Impacts
      • Communicating Your Results
  • Data De-Identification
    • Revision History
    • 1. Purpose
    • 2. Background
    • 3. Scope
    • 4. Statistical De-Identification
      • 4.1 Personal Characteristics of Individuals
      • 4.2 Numerator - Denominator Condition
      • 4.3 Assess Potential Risk
      • 4.4 Statistical Masking
      • 4.5 Legal Review
      • 4.6 Departmental Release Procedure for De-Identified Data
    • 5. Types of Reporting
      • 5.1 Variables
      • 5.2 Survey Data
      • 5.3 Budgets and Fiscal Estimates
      • 5.4 Facilities, Service Locations and Providers
      • 5.5 Mandated Reporting
    • 6. Justification of Thresholds Identified
      • 6.2 Assessing Potential Risk – Publication Scoring Criteria
      • 6.3 Assessing Potential Risk – Alternate Methods
      • 6.4 Statistical Masking
    • 7. Approval Process
    • 8. DDG Governance
    • 9. Publicly Available Data
    • 10. Development Process
    • 11. Legal Framework
    • 12. Abbreviations and Acronyms
    • 13. Definitions
    • 14. References
    • Appendix A: Expert Determination Template
    • Appendix B: 2015 HIPAA Reassessment Results
    • Appendix C: State and County Population Projections
  • Open Data Handbook
    • Revision History
    • Open Data: Purpose
    • Disclosure
    • Governance
    • Guidelines
    • Use
  • Appendix
    • Glossary and Acronyms
    • Data Tools
    • Data Discovery Sessions
    • Data Sharing Benefits
Powered by GitBook
On this page
  • Establishing Minimum Numerator and Denominator
  • Assessing Potential Risk – Publication Scoring Criteria

Was this helpful?

Export as PDF
  1. Data De-Identification

6. Justification of Thresholds Identified

Previous5.5 Mandated ReportingNext6.2 Assessing Potential Risk – Publication Scoring Criteria

Last updated 4 months ago

Was this helpful?

Establishing Minimum Numerator and Denominator

The DDG workgroup reviewed the published literature including information from other states and from the federal government. There was a great deal of variation in the numerical values chosen for the Numerator Condition. While the Centers for Disease Control and Prevention (CDC) WONDER database suppresses cells with numerators less than 10, the National Environmental Public Health Tracking Network suppresses cells that are greater than 0 but less than 6. Examples range from 3 to 40 with many being 10 to 15. The Centers for Medicare and Medicaid Services (CMS) uses a small cell policy of suppressing values derived from fewer than 11 individuals. As stated in a 2014 publication associated with a data release of Medicare Provider Data, “.” Of note, CMS only uses a Numerator Condition.

Just as there is no consistent value for the Numerator Condition, neither is there a consistent value for the Denominator Condition. Some examples include:

  • National Center for Health Statistics (public micro-data) – 250,000

  • National Environmental Health Tracking Network – 100,000

  • Maine Integrated Youth Health Survey – 5,000

In establishing a minimum denominator to protect confidentiality, the DDG workgroup began by looking at the risk associated with providing geography associated with record level data. As noted in the “Guidance Regarding Methods for De-identification of Protected HIPAA Privacy Rule”, published November, 2012 by the U.S. Department of Health & Human Services, Office for Civil Rights there is varying risk based on the level of zip code and how the zip code is combined with other variables. This means that over half of U.S. residents could be uniquely described just with these three data elements. For this reason, the HIPAA Safe Harbor rule specifies that the 3-Digit ZIP Code can be provided at the record level if the 3- Digit ZIP Code has a minimum of 20,000 people. By aggregating data for a given 3- Digit ZIP Code, the potential for identifying a unique individual is less than 0.04%.

By combining with the Numerator Condition, the risk becomes less than 0.04% because there will be a minimum of 11 individuals with a particular age and sex for the 3-Digit ZIP Code. Additionally, most tables will provide additional levels of aggregation further reducing risk. This reduction of risk is discussed further with respect to the Publication Scoring Criteria.

A minimum denominator of 20,000 was chosen as part of the numerator- denominator condition to leverage the risk assessment cited above.

The Numerator-Denominator Condition serves as an initial screening to assess potential risk for a data set. If this condition is met, additional analysis is not necessary. If the condition is not met, then the analysis proceeds to Step 3.

Assessing Potential Risk – Publication Scoring Criteria

The Publication Scoring Criteria is provided as an example of a method that meets the requirements of Step 3 in the Data Assessment for Public Release Procedure. It is a tool to assess and quantify potential risk for re-identification of de-identified data based on two identification risks: size of potential population and variable specificity. The Publication Scoring Criteria is used to assess the need to suppress small cells as a result of a small numerator, small denominator, or both small numerator and small denominator where a small numerator is less than 11 and a small denominator is less than 20,001. That is why the Publication Scoring Criteria takes into account both numerator (e.g., Events) and denominator (e.g., Geography) variables.

The Publication Scoring Criteria is based on a framework that has been in use by the Illinois Department of Public Health, Illinois Center for Health Statistics. Various other methods have been used to assess risk and the presence of sensitive or small cells. Public health has a long history of public provision of data and many methods have been used. Further discussion of other methods used to assess tables for sensitive or small cells is found in .

This section provides a more detailed review of the criteria that make up the Publication Scoring Criteria.

Section 6.3: Assessing Potential Risk