Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Spark NLP: How Roche automates knowledge extraction from pathology and radiology reports

Yogesh Pandit (Roche), Saif Addin Ellafi (John Snow Labs), Vishakha Sharma (Roche Molecular Solutions)
4:20pm5:00pm Wednesday, March 27, 2019
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data engineers, NLP and machine learning engineers, and those working in healthcare



What you'll learn

  • Discover how deep learning can be applied to NLP and how NLP can be used in healthcare to extract clinical facts from unstructured free-text (such pathology reports and radiology reports), to aid with clinical decision support


Many critical facts required by healthcare AI applications‚like patient risk prediction, cohort selection, and clinical decision support—are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like named entity recognition, assertion status detection, entity resolution, deidentification, and others, using novel healthcare-specific networks and models.

Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma share the first industrial-grade implementation of these new results and its application at scale based on the Spark NLP library. Specifically, they discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.

Yogesh Pandit


Yogesh Pandit is a senior software engineer in the Analytics Group at Roche. Currently, he’s leading the NLP efforts to support the company’s NAVIFY platform, which aims to support oncology care teams to review, discuss, and align on treatment decisions for the patient. Yogesh is a bioinformatician turned data engineer with experience in biomedical NLP. For the past few years, he’s been working on building data applications in the life sciences and healthcare space.

Photo of Saif  Addin Ellafi

Saif Addin Ellafi

John Snow Labs

Saif Addin Ellafi is a software developer at John Snow Labs, where he’s the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Saif has wide experience in problem solving and quality assurance in the banking and finance industry.

Photo of Vishakha Sharma

Vishakha Sharma

Roche Molecular Solutions

Vishakha Sharma is a data scientist for diagnostic information solutions at Roche, where she leads advanced analytics initiatives such as natural language processing (NLP) and machine learning (ML) to discover key insights improving NAVIFY product portfolio, leading to better and more efficient patient care. Vishakha has authored 40+ peer-reviewed publications and proceedings and has given 15+ invited talks. She serves on the program committee of the ACM-W, AMIA, and ACM-BCB. Her research work has been funded by the NIH Big Data to Knowledge (BD2K) initiative to build an NLP precision medicine software to automate molecular and clinical information extraction, categorization, and ranking of clinical evidence associated with biomarkers that predict response to cancer therapies. She holds a PhD in computer science.