Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Schedule: Text and Language processing and analysis sessions

9:00–12:30 Tuesday, 22 May 2018

Introduction to natural language processing with Python

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 10 Level: Beginner

Barbara Fusinska (Google)

Average rating:

(4.33, 3 ratings)

Natural language processing techniques help address tasks like text classification, information extraction, and content generation. Barbara Fusinska offers an overview of natural language processing and walks you through building a bag-of-words representation, using Python and its machine learning libraries, and then using it for text classification. Read more.

13:30–17:00 Tuesday, 22 May 2018

Natural language understanding at scale with spaCy and Spark NLP

Data science and machine learning
Location: Capital Suite 13 Level: Intermediate

David Talby (Pacific AI), Claudiu Branzan (Accenture)

Average rating:

(4.33, 3 ratings)

Natural language processing is a key component in many data science systems. David Talby and Claudiu Branzan lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, Spark NLP for building distributed natural language machine-learned pipelines, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.

16:35–17:15 Wednesday, 23 May 2018

Improving DevOps and QA efficiency using machine learning and NLP methods

Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Ran Taig (Dell), Omer Sagi (Dell)

Average rating:

(2.00, 1 rating)

DevOps and QA engineers spend a significant amount of time investigating reoccurring issues. These issues are often represented by large configuration and log files, so the process of investigating whether two issues are duplicates can be a very tedious task. Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. Read more.

16:35–17:15 Wednesday, 23 May 2018

Narrative extraction: Analyzing the world’s narratives through natural language understanding

Data science and machine learning
Location: Capital Suite 10/11 Level: Non-technical

Naveed Ghaffar (Narrative Economics), Rashed Iqbal (UCLA)

Average rating:

(3.67, 3 ratings)

Narratives are significant vectors of rapid change in culture, economic behavior, and the Zeitgeist of a society. Narrative economics studies the impact of popular human-interest stories on economic fluctuations. Naveed Ghaffar and Rashed Iqbal outline a framework that uses natural language understanding to extract and analyze narratives in human communication. Read more.

17:25–18:05 Wednesday, 23 May 2018

Using LSTMs to aid professional translators

Data science and machine learning, Emerging technologies and case studies
Location: Capital Suite 13 Level: Intermediate

Darren Cook (QQ Trend)

Darren Cook demonstrates how to use LSTMs, state-of-the-art tokenizers, dictionaries, and other data sources to tackle translation, focusing on one of the most difficult language pairs: Japanese to English. Read more.

14:05–14:45 Thursday, 24 May 2018

Spark NLP in action: Intelligent, high-accuracy fact extraction from long financial documents

Data science and machine learning, Expo Hall
Location: Expo Hall Level: Intermediate

David Talby (Pacific AI), Saif Addin Ellafi (John Snow Labs), Paul Parau (UiPath)

Average rating:

(4.50, 4 ratings)

Spark NLP natively extends Spark ML to provide natural language understanding capabilities with performance and scale that was not possible to date. David Talby, Saif Addin Ellafi, and Paul Parau explain how Spark NLP was used to augment the Recognos smart data extraction platform in order to automatically infer fuzzy, implied, and complex facts from long financial documents. Read more.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com