- The de-identification of data is an important part of healthcare technology, especially as the use of EHRs and HIEs becomes more prominent. The HIPAA Privacy Rule states that once data has been de-identified, covered entities can use or disclose it without any limitation. The information is no longer considered PHI, and does not fall under the same regulations and restrictions as PHI.
But why would a facility need to de-identify data? What are the potential benefits of the de-identification of data? HealthITSecurity.com decided to dissect this aspect of HIPAA regulations, and explain what the de-identification process entails and how covered entities could benefit from the practice.
What is de-identification?
The de-identification of data is where identifiers are removed from PHI, which helps mitigate privacy risks to individuals. Moreover, the medical information can then be used in areas such as research, policy assessment, and comparative effectiveness studies. As explained by the Department of Health & Human Services (HHS), the Privacy Rule has two de-identification methods:
- A formal determination by a qualified expert;
- The removal of specified individual identifiers as well as absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual.
Even so, HHS cautions that once the de-identification process has taken place, there is still a small chance that the data could be linked back its corresponding individual.
“Regardless of the method by which de-identification is achieved, the Privacy Rule does not restrict the use or disclosure of de-identified health information, as it is no longer considered protected health information,” according to HHS.
What are the different types of de-identification?
The first type of de-identification is done through expert determination. A person “with appropriate knowledge of and experience” in rendering data unidentifiable will apply the necessary methods to determine that the risk to the data is small. From there, that individual will document the methods and results, proving how he or she came to the determination that the data had been de-identified.
The second method is called the “Safe Harbor” method. In this approach, a CE is permitted to consider data to be de-identified if it removes 18 types of identifiers. Some of the types of identifiers include:
- Telephone numbers
- Email addresses
- Social Security numbers
- Medical record numbers
The next stipulation in the Safe Harbor method is that the CE does not have any knowledge that the data could be used alone or in combination with other information to determine an individual’s identification from it.
“De-identified health information created following these methods is no longer protected by the Privacy Rule because it does not fall within the definition of PHI,” HHS stated. “Of course, de-identification leads to information loss which may limit the usefulness of the resulting health information in certain circumstances.”
Can you re-identify the data?
The data can go through a re-identification process. This requires a unique code be assigned to the set of de-identified health information. From there, two provisions must occur:
Derivation – The code or other means of record identification is not derived from or related to information about the individual and is not otherwise capable of being translated so as to identify the individual; and
Security – The covered entity does not use or disclose the code or other means of record identification for any other purpose, and does not disclose the mechanism for re-identification.
Why would a CE de-identify data?
As mentioned earlier, there are several reasons why a CE would want to de-identify certain information. By removing certain personal identifiers, the data is no longer considered PHI, and can therefore be used in many other situations. For example, certain types of research or comparative studies could benefit from medical information. But to ensure the identify of individuals remains hidden, specific pieces of information could be removed.
The examples below show how an individual expert could de-identify data. The first table shows PHI and the second has had some identifiers removed.
The second table shows suppressed patient values. Suppression can be used on individual records if they are deemed too risky to share, or if a particular record is found to be distinguishable. For example, an individual in a specific zip code who makes $200,000 per year could be easily identifiable, especially if the majority of other residents make significantly less.
Other methods in removing data are generalization and perturbation. Generalization is where data is abbreviated, such as removing numbers in a zip code or changing patient ages from a specific number to age ranges (i.e. 25 to 35 instead of 27 year-old).
Perturbation replaces specific values with new, also specific values. For example, a patient’s age could actually be 16, but after the de-identification it is within two years of that age. This approach is often used to maintain statistical properties about the original data, such as mean or variance, according to HHS.
“Using such methods, the expert will prove that the likelihood an undesirable event (e.g., future identification of an individual) will occur is very small,” HHS explained.
The future of de-identification
Health data sharing is becoming an increasingly popular topic. More companies want to further genetic research in order to find cures for diseases or new treatment methods. However, it is critical that CEs remain HIPAA compliant throughout the entire process. Whether an organization wants to assist in research or compile comparative data for its own uses, the de-identification of data is essential in keeping patient information as secure as possible.