Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Spark NLP in action: How SelectData uses AI to better understand home health patients

David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)

1:10pm–1:50pm Thursday, 09/13/2018

Data science and machine learning
Location: 1A 06/07 Level: Intermediate

Secondary topics: Health and Medicine, Text and Language processing and analysis

Average rating:

(3.00, 4 ratings)

Who is this presentation for?

Data scientists, NLP engineers, and software architects

Prerequisite knowledge

Familiarity with machine learning and Spark

What you'll learn

Understand how natural language understanding can be applied to patient records and how deep learning can be applied to NLP
Explore Spark NLP, an open source NLP library for Apache Spark

Description

Accurately answering clinical and billing questions by reading patient records, which can be a hundred or more pages long, is a challenge even for human domain experts. While traditional rule-based or expression-matching techniques work for simple fields in templated documents, it’s harder to infer facts based on implied statements, the absence of certain statements, or a combination of other facts. Answering such questions at a very high level of accuracy requires state-of-the-art deep learning techniques applied to NLP.

Spark NLP, John Snow Labs’s NLP library for Apache Spark, is an open source library that natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was previously impossible. It provides advanced NLP algorithms like named entity recognition, fact extraction, spell checking, sentiment analysis, assertion status detection, and entity resolution and enables highly efficient training domain-specific machine learning and deep learning NLP models—a prerequisite for high-accuracy question answering.

David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols explain how Spark NLP augments the SelectData Data Science Platform to extract fuzzy, implied, and complex facts from home health patient records, covering the technical challenges, the architecture of the full solution, and lessons learned that you can directly apply to your next natural language understanding project.

David Talby

Pacific AI

David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, agile, distributed teams. Previously, he led business operations for Bing Shopping in the US and Europe with Microsoft’s Bing Group and built and ran distributed teams that helped scale Amazon’s financial systems with Amazon in both Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

Website

Alberto Andreotti

John Snow Labs

Alberto Andreotti is a senior data scientist on the Spark NLP team at John Snow Labs, where he implements state-of-the-art NLP algorithms on top of Spark. He has a decade of experience working for companies including Motorola, Intel, and Samsung and as a consultant, specializing in the field of machine learning. Alberto has written lots of low-level code in C/C++ and was an early Scala enthusiast and developer. A lifelong learner, he holds degrees in engineering and computer science and is working on a third in AI. Alberto was born in Argentina. He enjoys the outdoors, particularly hiking and camping in the mountains of Argentina.

Website

Stacy Ashworth

SelectData

Stacy Ashworth is a registered nurse and chief clinical officer at SelectData. Stacy’s professional interests lie in the use of technology to improve the quality of care through better decision making. An accomplished speaker, she has served as a contributor to the healthcare informatics and technology track of the 2016 Business and Health Administration Association meeting, performing research regarding the evaluation of glucose monitoring technologies for cost-effective and quality control/management of diabetes. She holds a master’s degree in healthcare administration with an emphasis in informatics. Postacute care, geriatrics, and coding may be her passions, but her love is firmly centered on her family of two lively teenagers, a spouse, and a couple of schnauzers to keep things interesting.

Tawny Nichols

Select Data

Tawny Nichols is chief information officer at SelectData, where she is responsible for new product development, clinical tools, and all technology-related needs. She also leads SelectData’s innovation of data-driven business models. Tawny has over 15 years’ experience supporting the homecare industry. She is currently pursuing an MS in healthcare informatics at the University of San Diego.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com