Accurately answering clinical and billing questions by reading patient records, which can be a hundred or more pages long, is a challenge even for human domain experts. While traditional rule-based or expression-matching techniques work for simple fields in templated documents, it’s harder to infer facts based on implied statements, the absence of certain statements, or a combination of other facts. Answering such questions at a very high level of accuracy requires state-of-the-art deep learning techniques applied to NLP.
Spark NLP, John Snow Labs’s NLP library for Apache Spark, is an open source library that natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was previously impossible. It provides advanced NLP algorithms like named entity recognition, fact extraction, spell checking, sentiment analysis, assertion status detection, and entity resolution and enables highly efficient training domain-specific machine learning and deep learning NLP models—a prerequisite for high-accuracy question answering.
David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols explain how Spark NLP augments the SelectData Data Science Platform to extract fuzzy, implied, and complex facts from home health patient records, covering the technical challenges, the architecture of the full solution, and lessons learned that you can directly apply to your next natural language understanding project.
David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, agile, distributed teams. Previously, he led business operations for Bing Shopping in the US and Europe with Microsoft’s Bing Group and built and ran distributed teams that helped scale Amazon’s financial systems with Amazon in both Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.
Alberto Andreotti is a senior data scientist on the Spark NLP team at John Snow Labs, where he implements state-of-the-art NLP algorithms on top of Spark. He has a decade of experience working for companies including Motorola, Intel, and Samsung and as a consultant, specializing in the field of machine learning. Alberto has written lots of low-level code in C/C++ and was an early Scala enthusiast and developer. A lifelong learner, he holds degrees in engineering and computer science and is working on a third in AI. Alberto was born in Argentina. He enjoys the outdoors, particularly hiking and camping in the mountains of Argentina.
Stacy Ashworth is a registered nurse and chief clinical officer at SelectData. Stacy’s professional interests lie in the use of technology to improve the quality of care through better decision making. An accomplished speaker, she has served as a contributor to the healthcare informatics and technology track of the 2016 Business and Health Administration Association meeting, performing research regarding the evaluation of glucose monitoring technologies for cost-effective and quality control/management of diabetes. She holds a master’s degree in healthcare administration with an emphasis in informatics. Postacute care, geriatrics, and coding may be her passions, but her love is firmly centered on her family of two lively teenagers, a spouse, and a couple of schnauzers to keep things interesting.
Tawny Nichols is chief information officer at SelectData, where she is responsible for new product development, clinical tools, and all technology-related needs. She also leads SelectData’s innovation of data-driven business models. Tawny has over 15 years’ experience supporting the homecare industry. She is currently pursuing an MS in healthcare informatics at the University of San Diego.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com