Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Spark NLP in action: Improving patient flow forecasting at Kaiser Permanente

David Talby (Pacific AI), Santosh Kulkarni (Kaiser Permanente)
11:50am12:30pm Wednesday, March 7, 2018
Data science and machine learning
Location: Expo Hall 1
Secondary topics:  Expo Hall
Average rating: ***..
(3.50, 2 ratings)

Who is this presentation for?

  • CTOs, VPs of engineering and data science, and data science leaders in healthcare

Prerequisite knowledge

  • Familiarity with Spark, machine learning, and NLP

What you'll learn

  • Learn how Kaiser Permanente uses the open source NLP library for Apache Spark to improve the accuracy of forecasting the demand for hospital beds


Applying natural language processing in practice is nontrivial for two reasons. First, human language is nuanced, fuzzy, and highly contextual, requiring domain-specific models to be trained for most tasks. Second, NLP is usually just part of a bigger machine learning or information retrieval pipeline that solves for a real business use case. Putting together a complete, scalable, performant, measurable, and reproducible pipeline traditionally requires significant engineering compromises.

David Talby and Santosh Kulkarni explain how Kaiser Permanente uses the open source NLP library for Apache Spark to tackle one of the most common challenges with applying natural language process in practice—integrating domain-specific NLP as part of a scalable, performant, measurable, and reproducible machine learning pipeline—and improve the accuracy of forecasting the demand for hospital beds. Accurate forecasting is critical to ensuring that enough beds and nurses are available to take care of incoming patients. While some predictive features are structured, many relevant features are locked in free-text clinical notes. Along the way, David and Santosh explain how Kaiser Permanente’s systems meet the highest standards of robustness, scale, and compliance.

Topics include:

  • Extending Spark NLP with domain-specific word embeddings, named entity recognition, and assertion polarity models to match the vocabulary, grammar, and context of clinical notes
  • Using the Spark ML Pipeline APIs to define a unified, frictionless pipeline that runs NLP algorithms and then uses them as features when training the forecasting model
  • Leveraging Spark’s built-in optimizations and features to deliver a fully distributed, highly performant, serializable, and easily measurable learning pipeline
Photo of David Talby

David Talby

Pacific AI

David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, agile, distributed teams. Previously, he led business operations for Bing Shopping in the US and Europe with Microsoft’s Bing Group and built and ran distributed teams that helped scale Amazon’s financial systems with Amazon in both Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

Photo of Santosh Kulkarni

Santosh Kulkarni

Kaiser Permanente

Santosh Kulkarni is a product leader at Kaiser Permanente, where he is responsible for driving the development of its intelligence platform systems. Santosh has a deep passion for healthcare and technology and has been an active member in many of the healthcare industry’s forums, which have shaped the healthcare industry in the recent years. An experienced healthcare thought leader, Santosh has advised and supported some of the top global healthcare players in defining and building next-generation healthcare products and solutions. Previously, he spent more than a decade providing strategic, product, and digital transformation consulting and services to healthcare organizations, with focus on digital health and consumer and population health management, and was part of the initial architecture team that built Siemens’s flagship EHR platform, Soarian. Santosh holds a master’s degree in business administration and a bachelor’s degree in computer science and engineering.