9. Publicly Available Data
A critical step in reviewing data for public release is the consideration of what other data may be publicly available that could be used in combination with the newly released data to identify the individuals represented in the data. This section will highlight some specific data sets that are publicly available that may be used in combination with CHHS data that would contribute to potential increased risk.
Common kinds of data with personal information include: real estate records, individual licensing databases (MD, RN, contractors, lawyers, etc.), marriage records, news (and other) media reports, commercially available databases (data brokers, marketing), court documents, etc.
Vital Records Data
Another common data set for programs to be aware of are the publicly available electronic birth and death indices from Vital Records, as specified in Health and Safety Code section 102230(b).
The following are provided in the birth record indices:
First, middle, and last name
Sex
Date of birth
Place of birth
Other potential sources of publicly available data to consider are informational certified copies of birth and death certificates. In California, anyone can obtain an informational certified copy of birth and death certificates, which are clearly marked as un-authorized copies that cannot be used to verify identity. In reality, it is difficult to use these as a dataset for the following reasons:
Certified copies of birth and death certificates must be obtained on an individual basis, and you must be able to identify the record. In other words, an individual cannot simply ask for a stack of certificates for purposes of creating a dataset.
Certified copies are issued on specialized banknote paper, not in electronic format, which creates a problem of scale when trying to create a dataset.
There is a $25 fee for each certified copy of a birth certificate and $21 for a certified copy of a death certificate, which also creates a problem of scale when trying to create a dataset.
Certified copies are meant for individual use. A request for a large amount of certificates may generate an investigation among vital records staff as to why so many certificates were requested at once.
CalHHS Open Data Portal
As additional data sets are added to the Open Data Portal, programs need to take that information into account when considering potential risk for any given data set. The CHHS Open Data Workgroup will be providing easier access to both lists of data currently on the portal as well as data sets planned for addition to the porta. While significant with over 100 data sets, this is not exhaustive because of the PRA, which allows for an extremely broad amount of information to be released in a sporadic way. So some specificity can occur but not completely. CHHS departments have a duty of due diligence in the de-identification process regarding consideration of published identifiable data, published de-identified data and the soon to be published de-identified data.
Listed below are individual records or documents that the Department of Rehabilitation have available to the public:
Fair Hearing Decisions include appellant’s initials and possibly other information, depending on issue appellant presents for hearing, such as sex, disability, employment, education, vocational rehabilitation services, etc.; and
Monthly Operating Reports and information therefrom includes names of licensees and financial information regarding the operation of the licensees’ operation of vending facilities in the Business Enterprises Program for the Blind. To be eligible for this program, the individuals must be legally blind.
Public Census and Demographic Information
Estimates - Official population estimates of the state, counties and cities produced by the Demographic Research Unit for state planning and budgeting.
Projections - Forecasts of population, births and public school enrollment at the state and county level produced by the Demographic Research Unit.
State Census Data Center - Demographic, social, economic, migration, and housing data from the decennial censuses, the American Community Survey, the Current Population Survey, and other special and periodic surveys.
Commonly Shared Information
With the growth of social media, people frequently share information through tools such as Facebook, Linked In, and Tweets. While it would be impossible to take into account all information that people make public about themselves, there is an expectation that a certain amount of information is likely to be in the public domain based on information individuals frequently provide about themselves. Examples of such information include wedding dates, birth dates, education (high school, college) and professional certifications.
Geographic Information
Geographic information is particularly suited to being combined with other geographic information given the relatively standardized was data is coded (latitude, longitude, county, etc.) With the use of mapping tools, various information can be combined in a way that is called a “mash up.”
“A mashup, in web development, is a web page, or web application, that uses content from more than one source to create a single new service displayed in a single graphical interface. For example, you could combine the addresses and photographs of your library branches with a Google map to create a map mashup.[1] The term implies easy, fast integration, frequently using open application programming interfaces (open API) and data sources to produce enriched results that were not necessarily the original reason for producing the raw source data."
Last updated
Was this helpful?