Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies

David Talby (Pacific AI), Claudiu Branzan (Accenture)
1:15pm–1:55pm Thursday, 09/29/2016
Data science & advanced analytics
Location: Hall 1C Level: Intermediate
Tags: ai
Average rating: ****.
(4.00, 1 rating)

Prerequisite knowledge

  • Basic familiarity with natural language processing, machine-learning, and Spark concepts
  • What you'll learn

  • Explore a reference architecture and implementation for semantic natural language understanding at scale
  • Description

    A text-mining system must go way beyond indexing and search to appear truly intelligent. First, it should understand language beyond keyword matching. (For example, distinguishing between “Jane has the flu,” “Jane may have the flu,” “Jane is concerned about the flu," “Jane’s sister has the flu, but she doesn’t,” or “Jane had the flu when she was 9” is of critical importance.) This is a natural language processing problem. Second, it should “read between the lines” and make likely inferences even if they’re not explicitly written. (For example, if Jane has had a fever, a headache, fatigue, and a runny nose for three days, not as part of an ongoing condition, then she likely has the flu.) This is a semi-supervised machine-learning problem. Third, it should automatically learn the right contextual inferences to make. (For example, learning on its own that fatigue is sometimes a flu symptom—only because it appears in many diagnosed patients—without a human ever explicitly stating that rule.) This is an association-mining problem, which can be tackled via deep learning or via more guided machine-learning techniques.

    David Talby and Claudiu Branzan lead a live demo of an end-to-end system that makes nontrivial clinical inferences from free-text patient records and provides real-time inferencing at scale. The architecture is built out of open source big data components: Kafka and Spark Streaming for real-time data ingestion and processing, Spark for modeling, and Elasticsearch for enabling low-latency access to results. The data science components include a UIMA pipeline with custom annotators, machine-learning models for implicit inferences, and dynamic ontologies for representing and learning new relationships between concepts. Source code will be made available after the talk to enable you to hack away on your own.

    Photo of David Talby

    David Talby

    Pacific AI

    David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, agile, distributed teams. Previously, he led business operations for Bing Shopping in the US and Europe with Microsoft’s Bing Group and built and ran distributed teams that helped scale Amazon’s financial systems with Amazon in both Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

    Photo of Claudiu Branzan

    Claudiu Branzan

    Accenture

    Claudiu Branzan is a analytics senior manager in the Applied Intelligence Group at Accenture, based in Seattle, where he leverages his more than 10 years of expertise in data science, machine learning, and AI to promote the use and benefits of these technologies to build smarter solutions to complex problems. Previously, Claudiu held highly technical client-facing leadership roles in companies utilizing big data and advanced analytics to offer solutions for clients in healthcare, high-tech, telecom, and payments verticals.

    Comments on this page are now closed.

    Comments

    Picture of Claudiu Branzan
    09/30/2016 2:49pm EDT

    The notebooks and some links are available here:
    https://github.com/Atigeo/nlp_demo

    Picture of Claudiu Branzan
    09/30/2016 2:45pm EDT

    If you liked the session, please rate it! Thank you !