Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY
Discover opportunities for applied AI
Organizations that successfully apply AI innovate and compete more effectively. How is AI transforming your business?
Be a part of the program—apply to speak by October 16.

How to train your model (and catch label leakage)

Till Bergmann (Salesforce), Leah McGuire (Salesforce)
1:50pm2:30pm Thursday, April 18, 2019
Case Studies, Machine Learning
Location: Sutton South

Who is this presentation for?

data scientists, machine learning engineers, product owners of ML products



Prerequisite knowledge

Understanding of data, messy data, as well as basic understanding of machine learning methods

What you'll learn

Techniques and methods to both identify and deal with label leakage and messy data successfully.


A pervasive but often overlooked problem in predictive modeling on real-life data is the problem of data or label leakage. At Enterprise companies such as Salesforce that provide ML-as-a-service to other businesses, this problem takes on monstrous proportions as the data is populated by diverse and often unknown business processes, making it very hard for data scientists to distinguish cause from effect. In this talk, we will describe how we tackled this problem at Salesforce scale, where we need to churn out thousands of personalized customer-specific machine learning models for any given use case. The automated approaches we describe are a part of our recently open-sourced Spark-based library TransmogrifAI, and extend the boundaries of what typically falls in the domain of “automated machine learning".

Photo of Till Bergmann

Till Bergmann


Till Bergmann is a Senior Data Scientist at Salesforce Einstein, building platforms to make it easier to integrate machine learning into Salesforce products, with a focus on automating many of the laborious steps in the machine learning pipeline. Before Salesforce, he obtained a PhD in Cognitive Science at the University of California, Merced, where he studied collaboration patterns of academics using NLP techniques.

Photo of Leah McGuire

Leah McGuire


Leah McGuire is a Principal Member of Technical Staff at Salesforce Einstein, building platforms to enable the integration of machine learning into Salesforce products. Before joining Salesforce, Leah was a Senior Data Scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products. She completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)