CalHHS Data Knowledge Base
CalHHS Open Data PortalCalHHS Geoportal
  • Data Knowledge Base
  • Data Sharing
    • Revision History
    • Data Sharing Guidebook
    • Lessons Learned
    • Data Sharing Plays
      • Play 1: Sharing Metrics
      • Play 2: Identify
      • Play 3: Business Case
      • Play 4: Prioritize
      • Play 5: Metadata
      • Play 6: Describe
      • Play 7: Promote
      • Play 8: Prepare
    • Data Element Definitions
    • Application Program Interfaces
    • Additional Training and Reference Materials
    • Business Case Creation
      • Determining Goals and Strategy
      • Implementation Details
      • Evaluating Outcomes & Impacts
      • Communicating Your Results
  • Data De-Identification
    • Revision History
    • 1. Purpose
    • 2. Background
    • 3. Scope
    • 4. Statistical De-Identification
      • 4.1 Personal Characteristics of Individuals
      • 4.2 Numerator - Denominator Condition
      • 4.3 Assess Potential Risk
      • 4.4 Statistical Masking
      • 4.5 Legal Review
      • 4.6 Departmental Release Procedure for De-Identified Data
    • 5. Types of Reporting
      • 5.1 Variables
      • 5.2 Survey Data
      • 5.3 Budgets and Fiscal Estimates
      • 5.4 Facilities, Service Locations and Providers
      • 5.5 Mandated Reporting
    • 6. Justification of Thresholds Identified
      • 6.2 Assessing Potential Risk – Publication Scoring Criteria
      • 6.3 Assessing Potential Risk – Alternate Methods
      • 6.4 Statistical Masking
    • 7. Approval Process
    • 8. DDG Governance
    • 9. Publicly Available Data
    • 10. Development Process
    • 11. Legal Framework
    • 12. Abbreviations and Acronyms
    • 13. Definitions
    • 14. References
    • Appendix A: Expert Determination Template
    • Appendix B: 2015 HIPAA Reassessment Results
    • Appendix C: State and County Population Projections
  • Open Data Handbook
    • Revision History
    • Open Data: Purpose
    • Disclosure
    • Governance
    • Guidelines
    • Use
  • Appendix
    • Glossary and Acronyms
    • Data Tools
    • Data Discovery Sessions
    • Data Sharing Benefits
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Data De-Identification
  2. 6. Justification of Thresholds Identified

6.4 Statistical Masking

Statistical masking provides an extensive set of tools that can be used to mitigate potential risk in a given data presentation. As discussed in Section 4.4, the data releaser will assess the need for statistical masking when the assessment in Step 3 identified potential risk. Each department will document statistical masking processes that are routinely used in data preparation for public release.

As discussed in section 4.4, initial methods to address sensitive or small cells, as well as complimentary cells include the following:

  • Reduce Table Dimensions

  • Reduce Granularity of Variable(s), aka Recoding or Aggregation

  • Cell Suppression and Complementary Cell Suppression

Small cell sizes are typically encountered when one of the following conditions is met.

  • Multiple variables. This most often occurs in a pivot table presentation or a query interface where a user may have occurrences of disease X, stratified by county, stratified by sex, stratified by race and ethnicity.

  • Granular variables. The more granular the variable the smaller the potential numerator and denominator. This most commonly occurs with shortening the time period of reporting (weekly) or making the geography more specific (zip code or census tract). However, it can also occur when there are many categories for a variable. An example of this is aid codes in Medi-Cal where there are almost 200 aid codes.

  • Rare events. Examples include diseases such as hemophilia. Examples of incidents may result from mass trauma events such as a plane crash or multi- car accident.

In each of these cases, statistical masking may be addressed in a number of ways. For this reason, it is important to keep in mind the purpose for the reporting so that the method chosen for masking can still maximize the usefulness of the data provided. Choices for each condition are highlighted below.

  • Multiple variables. Options include separating the table into multiple tables that limit the number of variables included in each table; decreasing the granularity of the variables included in the table; or suppressing the small cell with an indicator that it is less than 11.

  • Granular variables. A common approach to this situation would be to decrease the granularity of the variables although suppressing the small cell with an indicator that it is less than 11 is also an option.

  • Rare events. In these cases it becomes very challenging to suppress the value in a way that it will not be able to be used with other public information to identify individuals. Additionally, with rare events, there is more significance in the variance of small numbers.

In addition to small cells, complementary cells must also be suppressed. Complementary cells are those which must be suppressed to prevent someone from being able to calculate the suppressed cell based on row or column totals in combination with other data in that row or column.

Suppressing small cell values and complimentary cells can be done in two ways.

  1. Use a symbol to indicate the cell has been suppressed. Identify any other cells (complimentary cells) that can be used to calculate the small cell and use a symbol to indicate the cell has been suppressed.

  2. Use a symbol to indicate the cell has been suppressed or leave the cell blank and remove the value from all pertinent row and column totals so that the cell cannot be calculated. This negates the need for evaluation of complementary cells. This method must be used with great caution because the totals may actually be published in other non-related tables. For this reason the method is not recommended.

When suppressing values, the following footnote to indicate the suppression is recommended:

“Values are not shown to protect confidentiality of the individuals summarized in the data.”

In addition to the above, there are a number of other methods that may be used for Statistical Masking. Methods discussed in the “” include the following for tables of counts or frequencies and for magnitude data.

Tables of Counts or Frequencies

  • Sampling as a Statistical Disclosure Limitation Method

  • Defining Sensitive Cells

    • Special Rules

    • The Threshold Rule

  • Protecting Sensitive Cells After Tabulation

    • Suppression

    • Random Rounding

    • Controlled Rounding

    • Controlled Tabular Adjustment

  • Protecting Sensitive Cells Before Tabulation

Tables of Magnitude Data

  • Defining Sensitive Cells – Linear Sensitivity Rules

  • Protecting Sensitive Cells After Tabulation

  • Protecting Sensitive Cells Before Tabulation

Previous6.3 Assessing Potential Risk – Alternate MethodsNext7. Approval Process

Last updated 4 months ago

Was this helpful?