San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Schedule: Text and Language processing and analysis sessions

9:00am–12:30pm Tuesday, March 26, 2019

Natural language understanding at scale with Spark NLP

Data Science, Machine Learning & AI
Location: 2009

David Talby (Pacific AI), Alex Thomas (John Snow Labs), Claudiu Branzan (Accenture)

Average rating:

(4.75, 8 ratings)

David Talby, Alex Thomas, and Claudiu Branzan lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.

9:00am–5:00pm Tuesday, March 26, 2019

Data Case Studies

Location: 2022

Alex Kudriashova (Astro Digital), Jonathan Francis (Starbucks), JoLynn Lavin (General Mills), Robin Way (Corios), June Andrews (GE), Kyungtaak Noh (SK Telecom), Taposh DuttaRoy (Kaiser Permanente), Sabrina Dahlgren (Kaiser Permanente), Craig Rowley (Columbia Sportswear), Ambal Balakrishnan (IBM), Benjamin Glicksberg (UCSF), Patrick Lucey (Stats Perform), Rhonda Textor (True Fit)

Hear practical insights from household brands and global companies: the challenges they tackled, approaches they took, and the benefits—and drawbacks—of their solutions. Read more.

11:00am–11:40am Wednesday, March 27, 2019

Building high-performance text classifiers on a limited labeling budget

Data Science, Machine Learning & AI
Location: 2010

Robert Horton (Microsoft), Mario Inchiosa (Microsoft), Ali Zaidi (Microsoft)

Average rating:

(4.70, 10 ratings)

Robert Horton, Mario Inchiosa, and Ali Zaidi demonstrate how to use three cutting-edge machine learning techniques—transfer learning from pretrained language models, active learning to make more effective use of a limited labeling budget, and hyperparameter tuning to maximize model performance—to up your modeling game. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

Applied machine learning in finance

Data Science, Machine Learning & AI
Location: 2009

Chakri Cherukuri (Bloomberg LP)

Average rating:

(4.33, 3 ratings)

Quantitative finance is a rich field in finance where advanced mathematical and statistical techniques are employed by both sell-side and buy-side institutions. Chakri Cherukuri explains how machine learning and deep learning techniques are being used in quantitative finance and details how these models work under the hood. Read more.

11:50am–12:30pm Wednesday, March 27, 2019

NLP from scratch: Solving the cold start problem for natural language processing

Data Science, Machine Learning & AI
Location: 2010

Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)

Average rating:

(4.60, 15 ratings)

How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

From an archived data field to GO-JEK’s world-class product feature for customer experience

Data Science, Machine Learning & AI
Location: 2009

Divya Choudhary (University of Southern California)

Average rating:

(4.50, 2 ratings)

Divya Choudhary explains how GO-JEK uses random chat messages and notes written in a local language sent by customers to their drivers while waiting for a ride to arrive to carve out unparalleled information about pickup points and their names (which sometimes even Google Maps has no idea of) and help create a world-class customer pickup experience feature. Read more.

2:40pm–3:20pm Wednesday, March 27, 2019

Natural language understanding in task-oriented conversational AI

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Sonal Gupta (Facebook)

Average rating:

(4.40, 5 ratings)

Sonal Gupta explores practical systems for building a conversational AI system for task-oriented queries and details a way to do more advanced compositional understanding, which can understand cross-domain queries, using hierarchical representations. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Spark NLP: How Roche automates knowledge extraction from pathology and radiology reports

Data Science, Machine Learning & AI
Location: 2009

Yogesh Pandit (Roche), Saif Addin Ellafi (John Snow Labs), Vishakha Sharma (Roche Molecular Solutions)

Average rating:

(4.67, 3 ratings)

Yogesh Pandit, Saif Addin Ellafi, and Vishakha Sharma discuss how Roche applies Spark NLP for healthcare to extract clinical facts from pathology reports and radiology. They then detail the design of the deep learning pipelines used to simplify training, optimization, and inference of such domain-specific models at scale. Read more.

4:20pm–5:00pm Wednesday, March 27, 2019

Toward deep and representation learning for talent search at LinkedIn

Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall

Gungor Polatkan (LinkedIn)

Average rating:

(4.33, 3 ratings)

Talent search systems at LinkedIn strive to match the potential candidates to the hiring needs of a recruiter expressed in terms of a search query. Gungor Polatkan shares the results of the company's deployment of deep learning models on a real-world production system serving 500M+ users through LinkedIn Recruiter. Read more.

3:50pm–4:30pm Thursday, March 28, 2019

The Paradise Papers and West Africa Leaks: Behind the scenes with the ICIJ

Business Analytics and Visualization, Strata Business Summit
Location: 2018

Pierre Romera (International Consortium of Investigative Journalists (ICIJ))

Average rating:

(4.67, 6 ratings)

The ICIJ was the team behind the Panama Papers and Paradise Papers. Pierre Romera offers a behind-the-scenes look into the ICIJ's process and explores the challenges in handling 1.4 TB of data (in many different formats)—and making it available securely to journalists all over the world. Read more.

4:40pm–5:20pm Thursday, March 28, 2019

Bringing data to life: Combining machine learning and art to tell a data story

Case studies
Location: 2007

Nancy Rausch (SAS)

Average rating:

(4.80, 5 ratings)

For data to be meaningful, it needs to be presented in a way that people can relate to. Nancy Rausch explains how she combined streaming data from a solar array and machine learning techniques to create a live-action art piece—an approach that helped bring the data to life in a fun and compelling way. Read more.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com