Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Unlocking unstructured text data with summarization

Mike Lee Williams (Cloudera Fast Forward Labs)
5:25pm–6:05pm Wednesday, 09/28/2016
Data science & advanced analytics
Location: 3D 10 Level: Non-technical
Tags: ai
Average rating: ****.
(4.80, 10 ratings)

What you'll learn

  • Learn how to automatically summarize documents in three different ways
  • Understand the development and product trade-offs involved in choosing between these approaches
  • Explore the latest developments in computer understanding of human language
  • Description

    We’ve seen significant progress in infrastructure for using data effectively in the last half-decade. But this hasn’t applied to all types of data equally. Unstructured text, in particular, has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. Rather than being limited by what we can collect, we are now constrained by the tools, time, and techniques to make good use of it. But we are beginning to gain the ability to do remarkable things with unstructured text data.

    Michael Williams explores text summarization—taking text in and returning a shorter document that contains the same information—covering both single document and multidocument summarization. Michael demonstrates ways to solve the summarization problem that range from extremely simple algorithms that date back to the 1950s to the latest recurrent neural networks, explains how to choose between these approaches, and shows working prototype products for each.

    Summarizing tens or hundreds of thousands of articles at once represents an entirely new capability. But this capability is a solution to a bigger problem: it’s a gateway to quantified representations of text. The breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to the semantic meaning of text are poised to transform all the ways in which computers process language.

    Photo of Mike Lee Williams

    Mike Lee Williams

    Cloudera Fast Forward Labs

    Mike Lee Williams is a research engineer at Cloudera Fast Forward Labs, where he builds prototypes that bring the latest ideas in machine learning and AI to life and helps Cloudera’s customers understand how to make use of these new technologies. Mike holds a PhD in astrophysics from Oxford.