Sep 23–26, 2019

Protecting the Healthcare Enterprise from PHI Breaches using Streaming and NLP

Jeff Zemerick (Mountain Fog)
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1E 09
Secondary topics:  Health and Medicine, Privacy and Security

Who is this presentation for?

Big-data engineers, cloud architects, healthcare




The healthcare technology domain has seen explosive growth in the past decade. Hospitals small and large are beginning to adopt cloud technologies and many are in a position of hybrid environments. These distributed environments pose unique challenges and none of which are arguably more critical than the protection of Protected Health Information (PHI).

The federal regulations that govern PHI have strict requirements on how PHI is handled and have potentially severe consequences in the case PHI is left unprotected or exposed. These requirements place a significant burden on the operations teams who maintain the environments. These teams often have to manage multiple production, test, and development environments. Having a tool that can help protect PHI from being transmitted across environment boundaries can help the organization meet its regulatory obligations.

In this talk we will present an architecture and the open source applications that implement it to identify and remove PHI from natural language text. We will show how streaming healthcare text can be ingested through Apache Kafka and processed by Apache Flink where NLP methods are used to identify and remove PHI. Lastly, we will demonstrate the deployment of the application to the cloud.

Prerequisite knowledge

Familiarity with cloud computing and big-data concepts if helpful but not required

What you'll learn

Audience members will learn about the impacts of PHI breaches and the safeguard controls needed to process PHI data. They will learn how Apache Kafka, Apache Flink and NLP services can be used to identify and remove PHI in text and how an application such as this can be deployed to a cloud environment.
Photo of Jeff Zemerick

Jeff Zemerick

Mountain Fog

Jeff is a software engineer and cloud architect. He is a committer and PMC on Apache OpenNLP. Jeff currently works on cloud, big-data, and NLP projects.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    Contact list

    View a complete list of Strata Data Conference contacts