San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

How to train your model (and catch label leakage)

Till Bergmann (Salesforce)

2:40pm–3:20pm Thursday, March 28, 2019

Data Science, Machine Learning & AI
Location: 2010

Secondary topics: AI and Data technologies in the cloud, AI and machine learning in the enterprise, Automation in data science and big data

Average rating:

(3.67, 6 ratings)

Who is this presentation for?

Data scientists and product managers for ML/DS products

Level

Intermediate

Prerequisite knowledge

A basic understanding of machine learning models and messy data

What you'll learn

Understand why label leakage is an enormous problem in enterprise machine learning
Learn how Salesforce solves this problem using open source libraries

Description

A pervasive but often overlooked problem in predictive modeling on real-life data is the problem of data or label leakage. At enterprise companies that provide ML as a service to other businesses, such as Salesforce, this problem takes on monstrous proportions as the data is populated by diverse and often unknown business processes, making it very hard for data scientists to distinguish cause from effect.

Till Bergmann explains how Salesforce—which needs to churn out thousands of customer-specific models for any given use case—tackled this problem. The automated approaches are a part of our recently open-sourced Spark-based library TransmogrifAI and extend the boundaries of what typically falls in the domain of automated machine learning.

Till Bergmann

Salesforce

Till Bergmann is a senior data scientist at Salesforce Einstein, building platforms to make it easier to integrate machine learning into Salesforce products, with a focus on automating many of the laborious steps in the machine learning pipeline. He holds a PhD in cognitive science from the University of California, Merced, where he studied the collaboration patterns of academics using NLP techniques.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com