4.4 Statistical Masking
Last updated
Was this helpful?
Last updated
Was this helpful?
If Step 3 determined that the data set has a risk that small numerators or small denominators may result in conditions that put individuals at risk of being re-identified, then the data set must be assessed to determine the need for statistical masking of those small values and complimentary values. In performing the statistical masking, the data producer must consider what level of analysis may be sacrificed in order to produce a table with lower risk. Initial considerations for statistical masking are described below. For additional methods related to statistical masking, please see .
If there are more dimensions present in the table than necessary for the vast majority of analysis, the data producer should consider reducing the number of dimensions in a single table and produce multiple tables each with a subset of the dimensions in the table that resulted in small cells. For example, if there are six dimensions of interest for study, but a table that crosses all six dimensions produces a large number of small cells, the data producer could consider producing several tables each of which crosses four dimensions. This is especially effective if there are very few analytic questions requiring a cross section of all six variables.
An alternative approach to addressing small cells in a table is to reduce the number of levels of a particular dimension. This is especially useful for dimensions with a large number of levels that can be easily aggregated to fewer levels and maintain much of their utility. Geographic variables such as state or county can often be recoded into regional variables that still serve the analytic needs of the data user. It is also the only table restructuring option for tables with only two or three dimensions which have limited opportunities for table dimension reduction.
It should be noted that these actions can be used alone or in tandem to reduce, or completely eliminate, small cells within a table.
There will be cases where not all small cells can be eliminated by reducing granularity of dimensions or the number of dimensions present in a table. In these cases it will be necessary to suppress small cells and perform complementary suppression to ensure that precise values of small cells cannot be calculated using the values of unsuppressed cells and marginal values. In the simplest case this means ensuring that each column and row of a two dimensional table has at least two suppressions. This ensures that the precise values of the suppressed cells cannot be calculated. Complementary suppressions are often selected using one of the methods listed below.
The ‘analytically least interesting’ level of a particular dimension. This is often, ‘other’, or ‘I don’t know’.
The smallest cell available for complementary suppression. This is based on minimizing the ‘information loss’.
The cell most similar to the cell needing complementary suppression, such as adjacent age groups. This can produce complementary suppression that may be easier to interpret.