Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Meta-data science: When all the world's data scientists are just not enough

Leah McGuire (Salesforce)
12:0512:45 Thursday, 25 May 2017
Secondary topics:  Cloud
Level: Intermediate
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and anyone interested in the challenges of AI for enterprise

Prerequisite knowledge

  • A basic understanding of the machine-learning process

What you'll learn

  • Explore Salesforce's Einstein, a homegrown Spark ML-based machine-learning platform, and learn which parts of the typical machine-learning pipeline are easier to automate and which are harder


Due to privacy concerns and the nature of SaaS businesses, platforms like CRM systems often have to provide intelligent data-driven features that are built from many different unique, per-customer machine-learned models. In the case of Salesforce, this entails building hundreds of thousands of models tuned for as many distinctly different customers for any given data-driven application.

Leah McGuire offers an overview of Salesforce’s Einstein, a homegrown Spark ML-based machine-learning platform. Einstein’s automated feature engineering results in much quicker modeling turnarounds and higher accuracy than general-purpose modeling libraries such as scikit-learn; its automatic hyperparameter optimization, feature selection, and model selection result in a very good model for each specific customer; it includes modular workflows and transformations that complement systems like Spark ML and KeystoneML; and it offers huge scale that enables training thousands of models per day.

Photo of Leah McGuire

Leah McGuire


Leah McGuire is a lead member of the technical staff at Salesforce Einstein, where she builds platforms to enable the integration of machine learning into Salesforce products. Previously, Leah was a senior data scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products and completed a postdoctoral fellowship at the University of California, Berkeley. She holds a PhD in computational neuroscience from the University of California, San Francisco, where she studied the neural encoding and integration of sensory signals.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)