Cybersecurity News

Search Engines May Expose Patient Health Information, ACR warns

New search engine capabilities used by Google, Bing, and other vendors may inadvertently expose patient identifiers and other protected health information, ACR, RSNA, and SIIM warn.

search enginge Google Bing protected patient health information patient privacy indexing

By Jessica Davis

- New search engine capabilities may inadvertently expose patient identifiers and other protected health information, according to a warning from the American College of Radiology (ACR), Radiological Society of North America (RSNA), and Society for Imaging Informatics in Medicine (SIIM) to radiologists and other medical professionals.  

“The ability to use Optical Character Recognition (OCR) at scale allows programs to quickly re-generate explicit PHI that was originally burned into the image pixels,” researchers explained. “Search engines can then associate (‘index’) the image with that explicit PHI thereby making it discoverable.” 

“As a result, these data can be made available and linked to other text-based information,” they added. 

Healthcare providers commonly create presentations that may contain medical imaging for educational purposes and are covered in patient privacy regulations like HIPAA, as long as the providers use images with de-identified, anonymized patient data used for routine clinical operations. 

However, search engines may now index patient identifiers in those slide presentations, even if the provider believes the data has been de-identified. Advances made by Google, Bing, and other search engine developers in their content processing technology and web-crawling “increasingly enable large-scale information extraction from previously stored files.” 

READ MORE: Medical Software Database Exposes Personal Data of 3.1M Patients

As a result, the technology may extract source images from PowerPoint presentations and Adobe PDF files. The tech may also be able to recognize alphanumeric characters embedded in image pixels. 

“As such, an image with embedded patient information can be indexed by this process,” officials warned. “When explicit patient information becomes associated with images in the search engine database, it can be found on subsequent internet searches on the patient’s personal information.” 

“For example, when a patient searches her name in a search engine, images from a diagnostic imaging study performed four years ago appear,” they continued. “When she clicks on the images, she is directed to the website of a professional imaging association which stored an Adobe PDF file as part of an educational presentation.” 

Meanwhile, both the association and the file author may be unaware that the PHI was not sufficiently de-identified before the presentation was created and therefore may not preserve patient privacy.

ACR, RSNA, and SIIM are warning providers that any hosted content must be reviewed to ensure it’s appriopriately de-identified, as they are “responsible for protecting their patients’ privacy in this context just as they are in routine clinical operations.” 

READ MORE: Most At-Risk Medical Devices: PACS, HL7 Gateway, Radiotherapy Systems

To protect this data, providers must only use images without PHI in presentations. To ensure no patient data is included, those providers should leverage screen capture software to capture the image pixels only for the region of interest. 

Alternatively, the provider can disable patient information overlays or utilize an anonymization algorithm embedded in the picture archiving and communication system (PACS) prior to saving a screen or active window representation, the groups recommended. 

The presentation can also be created using third-party image processing software, such as Adobe Photoshop, to either crop out or obscure any protected health information before the image is inserted into a presentation. 

The groups stressed that cropping out PHI will not permanently remove it from the presentation. And applying “black bars” or another obscuration method is not a safe or compliant practice for effectively de-identifying data. 

“Specific functions are available in some software to permanently delete cropped, obscured or hidden information in presentation files,” the groups warned. “As a final quality control check, it’s recommended that these ‘sanitization’ functions be run on all presentations prior to being made public.” 

ACR established a resource page to include best practices, as well as recommended software functions to help radiologists permanently remove any sensitive content and other best practices for safely creating presentations. 

Best practice recommendations include workflow considerations for safely publishing medical images, MAC recommendations, PDF conversions, PowerPoint techniques, and regulatory implications. ACR explained that if some PHI has already been exposed, those providers can ask the search engine vendor to review the information and to consider removing the information if it’s agreed that is the appropriate action.

This is not the first report of search engine indexing issues. In fact, two of the biggest breaches reported in 2019 stemmed from two similar incidents. In December 2018, a University of Washington Medicine patient notified the provider, after a search of his name online uncovered a file containing their personal information. The breach was caused by a misconfiguration error. 

And in January 2019, Inmediata Health Group officials discovered a search engine function allowed internal Inmediata webpages used for business operations to be indexed, which exposed the patient data of 1.5 million patients in the process.