Many critical facts required by healthcare AI applications‚like patient risk prediction, cohort selection, and clinical decision support—are locked in unstructured free-text data. Recent advances in deep learning have raised the bar on achievable accuracy for tasks like named entity recognition, assertion status detection, entity resolution, deidentification, and others, using novel healthcare-specific networks and models.
Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma share the first industrial-grade implementation of these new results and its application at scale based on the Spark NLP library. Specifically, they discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale.
Yogesh Pandit is a senior software engineer in the Analytics Group at Roche. Currently, he’s leading the NLP efforts to support the company’s NAVIFY platform, which aims to support oncology care teams to review, discuss, and align on treatment decisions for the patient. Yogesh is a bioinformatician turned data engineer with experience in biomedical NLP. For the past few years, he’s been working on building data applications in the life sciences and healthcare space.
Saif Addin Ellafi is a software developer at John Snow Labs, where he’s the main contributor to Spark NLP. A data scientist, forever student, and an extreme sports and gaming enthusiast, Saif has wide experience in problem solving and quality assurance in the banking and finance industry.
Vishakha Sharma is a data scientist for diagnostic information solutions at Roche, where she leads advanced analytics initiatives such as natural language processing (NLP) and machine learning (ML) to discover key insights improving NAVIFY product portfolio, leading to better and more efficient patient care. Vishakha has authored 40+ peer-reviewed publications and proceedings and has given 15+ invited talks. She serves on the program committee of the ACM-W, AMIA, and ACM-BCB. Her research work has been funded by the NIH Big Data to Knowledge (BD2K) initiative to build an NLP precision medicine software to automate molecular and clinical information extraction, categorization, and ranking of clinical evidence associated with biomarkers that predict response to cancer therapies. She holds a PhD in computer science.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org