Feature engineering with Spark NLP to accelerate clinical trial recruitment
Who is this presentation for?Data scientists, machine learning engineers, and engineering leaders.
Recruiting patients for clinical trials is a major challenge in drug development. Finding patients requires an in-depth understanding of their medical histories and current health statuses while the majority of patient data is unstructured, and spread across physician notes, pathology, imaging, genomic, and other reports. For this reason, clinical trial recruitment is a slow and manual process. This case study describes how Deep6 uses the Spark NLP platform to apply state-of-the-art deep learning to accurately extract the relevant clinical facts from unstructured text. These facts are then used in subsequent data science pipelines in constructing patients’ medical histories.
John Snow Labs’ NLP library for Apache Spark is an open source library that provides natural language understanding capabilities with state-of-the-art accuracy, performance, and scale. It provides deep-learning based NLP algorithms for named entity recognition, spell checking, sentiment analysis, assertion status detection, entity resolution, OCR and sentence segmentation, and enables highly efficient training of domain-specific machine learning and deep learning NLP models.
We will explain how Deep6 utilizes Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. We will cover the technical challenges, the architecture of the full solution, and lessons learned that you can directly apply to your next natural language understanding project.
Prerequisite knowledgeBasic familiarity with NLP, Spark, and machine learning is assumed.
What you'll learn
Saif Addin Ellafi
John Snow Labs
Saif Addin Ellafi is a software developer at John Snow Labs, where he is the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Said has a wide experience in problem solving and quality assurance in the banking and finance industry.
Scott Hoch is a lead data scientist at Deep6.ai, where he works on matching patients with clinical trials in minutes, instead of months. Previously, he has been a VP Engineering at Duco, a solutions engineering at Gem HQ, and a data engineer at NationBuilder. He holds a Master degree in Physics from Yale University.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts