Sep 23–26, 2019
Please log in

Protecting the healthcare enterprise from PHI breaches using streaming and NLP

Jeff Zemerick (Mountain Fog)
4:35pm5:15pm Wednesday, September 25, 2019
Location: 1E 09
Average rating: ***..
(3.50, 2 ratings)

Who is this presentation for?

  • Big data engineers, cloud architects, and healthcare workers




The healthcare technology domain has seen explosive growth in the past decade. Hospitals small and large are beginning to adopt cloud technologies, and many are in hybrid environments. These distributed environments pose unique challenges, none of which are arguably more critical than the protection of PHI.

The federal regulations that govern PHI have strict requirements on how PHI is handled and have potentially severe consequences in the case PHI is left unprotected or exposed. These requirements place a significant burden on the operations teams who maintain the environments. These teams often have to manage multiple production, test, and development environments. Having a tool that can help protect PHI from being transmitted across environment boundaries can help the organization meet its regulatory obligations.

Jeff Zemerick explores an architecture and the open source applications that implement it to identify and remove PHI from natural language text. He examines how streaming healthcare text can be ingested through Apache Kafka and processed by Apache Flink where natural language processing (NLP) methods identify and remove PHI. And you’ll see a demonstration of the deployment of the application to the cloud.

Prerequisite knowledge

  • Familiarity with cloud computing and big data concepts (useful but not required)

What you'll learn

  • Understand the impacts of PHI breaches and the safeguard controls needed to process PHI data
  • Learn how Apache Kafka, Apache Flink, and NLP can identify and remove PHI in text and how an application such as this can be deployed to a cloud environment
Photo of Jeff Zemerick

Jeff Zemerick

Mountain Fog

Jeff Zemerick is a software engineer, cloud architect, and a consultant for Mountain Fog. He’s a committer and PMC on Apache OpenNLP. He currently works on cloud, big data, and NLP projects.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires