Keeping up with health big data de-identification standards

Author Name Patrick Ouellette   |   Date March 22, 2013   |   Tagged , , , , ,

As Bill Kleyman wrote on HealthITSecurity.com earlier this week, big data security is going to be a huge consideration for healthcare executives in the short and long term as the data is analyzed and dissected. And a critical part of securing big data is de-identifying the sensitive patient information included within, such as credit card information or disease history. Even Health Privacy Project Director Deven McGraw is openly concerned about Obama administration’s proposal that the Office of Personnel Management (OPM) create a health insurance claims database because of patient privacy risks related to big data silos.

Back in November, the Office for Civil Rights (OCR) provided its best selected methods for de-identifying health data, in accordance with the HIPAA Privacy Rule. The HIPAA Privacy Rule includes de-identified information, defined as data that doesn’t identify an individual and there’s no way an outside source can identify the individual with the data, in Section 164.514(a). The Privacy Rule specifies two ways in which information can be de-identified:

1. A person with appropriate expertise can render information “not identifiable” if s/he can determine that the risk is very small that the information could be used alone or in combination with other reasonably available information by an anticipated recipient to identify the individual. AND, this same person must document the methods and results of the analysis that justify this determination.

2. Alternatively, the following identifiers of the individual and his/her relatives, employers or household members must be removed or coded:

(1) Names
(2) All geographic subdivisions smaller than a State, including: street, city, county, precinct, zip code – the first three digits of the zip code can be used if this geocode includes more than 20,000 people. If such geocode is less than 20,000 persons, “000″ must be used as the zip code.
(3) All elements of dates (except year) related to an individual, including birth date, admission date, discharge date, date of death. For individuals > 89 years of age, year of birth cannot be used – all elements must be aggregated into a category of 90 and older.
(4) Telephone numbers
(5) FAX numbers
(6) Electronic mail addresses
(7) SSN
(8) Medical record numbers
(9) Health plan beneficiary numbers
(10) Account numbers
(11) Certificate/license numbers
(12) Vehicle identifiers and serial numbers, including license plates
(13) Device identifiers and serial numbers
(14) Web universal resource locators (URLs)
(15) Internet protocol (IP) address
(16) Biometric identifiers, including finger and voice prints
(17) Full face photos, and comparable images
(18) Any unique identifying number, characteristic or code

As noted on infosecisland.com, having specific, updated de-identification policies is important on a number of levels. Obviously staying current with HIPAA regulations is critical, but actually putting those policies into practice is what really helps keep the data safe. The same goes for large data sets that are used for analytical purposes, in which researchers obviously need to be looking at de-identified information to protect patient privacy.

De-identification isn’t a security panacea in general, as it’s not 100 percent irreversible, and it certainly shouldn’t be thought of as a silver bullet when coming up with big data security policies. But healthcare security executives at large organizations should review OCR’s de-identification best practices from November and make sure that at the very least they’ve considered all of their safeguard and access options.

When the HIPAA omnibus rules regarding subcontractors and business associates (BAs) are factored into the equation, it becomes even more important for healthcare organizations to have their big data ducks in a row.

Related White Papers:
Related Articles:





Leave a Reply