Feature engineering with Spark NLP to accelerate clinical trial recruitment
Who is this presentation for?
- Data scientists, machine learning engineers, and engineering leaders
Recruiting patients for clinical trials is a major challenge in drug development. Finding patients requires an in-depth understanding of their medical histories and current health statuses while the majority of patient data is unstructured and spread across physician notes, pathology, imaging, genomic, and other reports. For this reason, clinical trial recruitment is a slow and manual process.
Saif Addin Ellafi and Scott Hoch dive into a case study that describes how Deep 6 uses the Spark natural language processing (NLP) platform to apply state-of-the-art deep learning to accurately extract the relevant clinical facts from unstructured text. These facts are then used in subsequent data science pipelines in constructing patients’ medical histories.
John Snow Labs’s NLP library for Apache Spark is an open source library that provides natural language understanding capabilities with state-of-the-art accuracy, performance, and scale. It provides deep learning-based NLP algorithms for named entity recognition, spell checking, sentiment analysis, assertion status detection, entity resolution, optical character recognition (OCR), and sentence segmentation, and it enables highly efficient training of domain-specific machine learning and deep learning NLP models.
They explain how Deep 6 uses Spark NLP to scale its training and inference pipelines to millions of patients while achieving state-of-the-art accuracy. They explore the technical challenges, the architecture of the full solution, and the lessons Deep 6 learned that you can directly apply to your next natural language understanding project.
- Familiarity with NLP, Spark, and machine learning
What you'll learn
- Discover lessons learned and recommendations for achieving state-of-the-art NLP accuracy, performance, and scale in a real-life application
Saif Addin Ellafi
John Snow Labs
Saif Addin Ellafi is a software developer at John Snow Labs, where he’s the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Saif has wide experience in problem solving and quality assurance in the banking and finance industry.
Scott Hoch is the founder of Blackbox Engineering.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts