Sep 23–26, 2019
Please log in

Feature engineering with Spark NLP to accelerate clinical trial recruitment

Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering)
1:15pm1:55pm Wednesday, September 25, 2019
Location: 3B - Expo Hall
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, and engineering leaders




Recruiting patients for clinical trials is a major challenge in drug development. Finding patients requires an in-depth understanding of their medical histories and current health statuses while the majority of patient data is unstructured and spread across physician notes, pathology, imaging, genomic, and other reports. For this reason, clinical trial recruitment is a slow and manual process.

Saif Addin Ellafi and Scott Hoch dive into a case study that describes how Deep 6 uses the Spark natural language processing (NLP) platform to apply state-of-the-art deep learning to accurately extract the relevant clinical facts from unstructured text. These facts are then used in subsequent data science pipelines in constructing patients’ medical histories.

John Snow Labs’s NLP library for Apache Spark is an open source library that provides natural language understanding capabilities with state-of-the-art accuracy, performance, and scale. It provides deep learning-based NLP algorithms for named entity recognition, spell checking, sentiment analysis, assertion status detection, entity resolution, optical character recognition (OCR), and sentence segmentation, and it enables highly efficient training of domain-specific machine learning and deep learning NLP models.

They explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They explore the technical challenges, the architecture of the full solution, and the lessons Deep 6 learned that you can directly apply to your next natural language understanding project.

Prerequisite knowledge

  • Familiarity with NLP, Spark, and machine learning

What you'll learn

  • Discover lessons learned and recommendations for achieving state-of-the-art NLP accuracy, performance, and scale in a real-life application
Photo of Saif  Addin Ellafi

Saif Addin Ellafi

John Snow Labs

Saif Addin Ellafi is a software developer at John Snow Labs, where he’s the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Saif has wide experience in problem solving and quality assurance in the banking and finance industry.

Photo of Scott Hoch

Scott Hoch

BlackBox Engineering

Scott Hoch is the founder of Blackbox Engineering.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires