6.4 Statistical Masking

Statistical masking provides an extensive set of tools that can be used to mitigate potential risk in a given data presentation. As discussed in Section 4.4, the data releaser will assess the need for statistical masking when the assessment in Step 3 identified potential risk. Each department will document statistical masking processes that are routinely used in data preparation for public release.

As discussed in section 4.4, initial methods to address sensitive or small cells, as well as complimentary cells include the following:

Reduce Table Dimensions
Reduce Granularity of Variable(s), aka Recoding or Aggregation
Cell Suppression and Complementary Cell Suppression

Small cell sizes are typically encountered when one of the following conditions is met.

Multiple variables. This most often occurs in a pivot table presentation or a query interface where a user may have occurrences of disease X, stratified by county, stratified by sex, stratified by race and ethnicity.
Granular variables. The more granular the variable the smaller the potential numerator and denominator. This most commonly occurs with shortening the time period of reporting (weekly) or making the geography more specific (zip code or census tract). However, it can also occur when there are many categories for a variable. An example of this is aid codes in Medi-Cal where there are almost 200 aid codes.
Rare events. Examples include diseases such as hemophilia. Examples of incidents may result from mass trauma events such as a plane crash or multi- car accident.

In each of these cases, statistical masking may be addressed in a number of ways. For this reason, it is important to keep in mind the purpose for the reporting so that the method chosen for masking can still maximize the usefulness of the data provided. Choices for each condition are highlighted below.

Multiple variables. Options include separating the table into multiple tables that limit the number of variables included in each table; decreasing the granularity of the variables included in the table; or suppressing the small cell with an indicator that it is less than 11.
Granular variables. A common approach to this situation would be to decrease the granularity of the variables although suppressing the small cell with an indicator that it is less than 11 is also an option.
Rare events. In these cases it becomes very challenging to suppress the value in a way that it will not be able to be used with other public information to identify individuals. Additionally, with rare events, there is more significance in the variance of small numbers.

In addition to small cells, complementary cells must also be suppressed. Complementary cells are those which must be suppressed to prevent someone from being able to calculate the suppressed cell based on row or column totals in combination with other data in that row or column.

Suppressing small cell values and complimentary cells can be done in two ways.

Use a symbol to indicate the cell has been suppressed. Identify any other cells (complimentary cells) that can be used to calculate the small cell and use a symbol to indicate the cell has been suppressed.
Use a symbol to indicate the cell has been suppressed or leave the cell blank and remove the value from all pertinent row and column totals so that the cell cannot be calculated. This negates the need for evaluation of complementary cells. This method must be used with great caution because the totals may actually be published in other non-related tables. For this reason the method is not recommended.

When suppressing values, the following footnote to indicate the suppression is recommended:

“Values are not shown to protect confidentiality of the individuals summarized in the data.”

In addition to the above, there are a number of other methods that may be used for Statistical Masking. Methods discussed in the “Statistical Policy Working Paper 22 (Second version, 2005), Report on Statistical Disclosure Limitation Methodology” include the following for tables of counts or frequencies and for magnitude data.

Tables of Counts or Frequencies

Sampling as a Statistical Disclosure Limitation Method
Defining Sensitive Cells
- Special Rules
- The Threshold Rule
Protecting Sensitive Cells After Tabulation
- Suppression
- Random Rounding
- Controlled Rounding
- Controlled Tabular Adjustment
Protecting Sensitive Cells Before Tabulation

Tables of Magnitude Data

Defining Sensitive Cells – Linear Sensitivity Rules
Protecting Sensitive Cells After Tabulation
Protecting Sensitive Cells Before Tabulation

Previous6.3 Assessing Potential Risk – Alternate Methods Next7. Approval Process

Last updated 10 months ago

Was this helpful?