Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Meta-data science: When all the world's data scientists are just not enough

Leah McGuire (Salesforce)
12:0512:45 Thursday, 25 May 2017
Secondary topics:  Cloud
Level: Intermediate
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and anyone interested in the challenges of AI for enterprise

Prerequisite knowledge

  • A basic understanding of the machine-learning process

What you'll learn

  • Explore Salesforce's Einstein, a homegrown Spark ML-based machine-learning platform, and learn which parts of the typical machine-learning pipeline are easier to automate and which are harder


Due to privacy concerns and the nature of SaaS businesses, platforms like CRM systems often have to provide intelligent data-driven features that are built from many different unique, per-customer machine-learned models. In the case of Salesforce, this entails building hundreds of thousands of models tuned for as many distinctly different customers for any given data-driven application.

Leah McGuire offers an overview of Salesforce’s Einstein, a homegrown Spark ML-based machine-learning platform. Einstein’s automated feature engineering results in much quicker modeling turnarounds and higher accuracy than general-purpose modeling libraries such as scikit-learn; its automatic hyperparameter optimization, feature selection, and model selection result in a very good model for each specific customer; it includes modular workflows and transformations that complement systems like Spark ML and KeystoneML; and it offers huge scale that enables training thousands of models per day.

Photo of Leah McGuire

Leah McGuire


Leah McGuire is a principal member of the technical staff at Salesforce Einstein, where she builds platforms to enable the integration of machine learning into Salesforce products. Previously, Leah was a senior data scientist on the data products team at LinkedIn working on personalization, entity resolution, and relevance for a variety of LinkedIn data products and completed a postdoctoral fellowship at the University of California, Berkeley. She holds a PhD in computational neuroscience from the University of California, San Francisco, where she studied the neural encoding and integration of sensory signals.