Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Managing data science in the enterprise

Nick Elprin (Domino Data Lab)
13:3017:00 Tuesday, 22 May 2018
Strata Business Summit
Location: Capital Suite 8 Level: Intermediate

Who is this presentation for?

Data Science Leaders (i.e., managers, directors, VPs) and anyone that is planning to manage data science within an enterprise

Prerequisite knowledge

Prerequisite: Core understanding of data science. The tutorial is for attendees that are considering managing a data science capability or are already managing a data science capability.

Materials or downloads needed in advance

There are no special hardware/installation requirements for attendees.

What you'll learn

- *How to select the right data science project* - Many organizations start with the data and look for something “interesting” rather than building a deep understanding of the existing business process and then pinpointing the decision point that can be augmented or automated. - *How to organize data science within the enterprise* - There are tradeoffs between centralized vs federated models (or a hybrid approach with something like a center of excellence). - *Why rapid prototyping and design sprints aren’t just for software developers* - Leading organizations put prototyping ahead of the data collection process to ensure that stakeholder feedback is captured, increasing the probability of adoption. Some organizations even create synthetic data and naive baseline models to show how the model would impact existing business processes. - *Why order of magnitude ROI math should be on every hiring checklist* - The ability to estimate the potential business impact of a change in a statistical measure is one the best predictors of success for a data science team. - *The difference between “pure research” and “applied templates”* - 80% of data scientists’ think they’re doing the former, but realistically, the vast majority are applying well-known templates to novel business cases. Knowing which is which and how to manage them differently improves morale and output. - *Define a stakeholder-centric project management process* - The most common failure mode is data science delivers results that are either too late or don’t fit into how the business works today so results gather dust. Share insights early and often. - *Building for the scale that really matters* - Many organizations optimize for scale of data, but ultimately are overwhelmed by scale of the growing data science team and its business stakeholders. Team throughput grinds to a crawl as information loss compounds from the number of interactions in a single project, much less a portfolio of 100s or 1000s of projects.

Description

Abstract

- How to select the right data science project – Many organizations start with the data and look for something “interesting” rather than building a deep understanding of the existing business process and then pinpointing the decision point that can be augmented or automated.

- How to organize data science within the enterprise – There are tradeoffs between centralized vs federated models (or a hybrid approach with something like a center of excellence).

- Why rapid prototyping and design sprints aren’t just for software developers – Leading organizations put prototyping ahead of the data collection process to ensure that stakeholder feedback is captured, increasing the probability of adoption. Some organizations even create synthetic data and naive baseline models to show how the model would impact existing business processes.

- Why order of magnitude ROI math should be on every hiring checklist – The ability to estimate the potential business impact of a change in a statistical measure is one the best predictors of success for a data science team.

- The difference between “pure research” and “applied templates” – 80% of data scientists’ think they’re doing the former, but realistically, the vast majority are applying well-known templates to novel business cases. Knowing which is which and how to manage them differently improves morale and output.

- Define a stakeholder-centric project management process – The most common failure mode is data science delivers results that are either too late or don’t fit into how the business works today so results gather dust. Share insights early and often.

- Building for the scale that really matters – Many organizations optimize for scale of data, but ultimately are overwhelmed by scale of the growing data science team and its business stakeholders. Team throughput grinds to a crawl as information loss compounds from the number of interactions in a single project, much less a portfolio of 100s or 1000s of projects.

- Why time to iterate is the most important metric – Many organizations consider model deployment to be a moonshot, when it really should be laps around a racetrack. Minimal obstacles (without sacrificing rigorous review and checks) to test real results is another great predictor of data science success. Facebook and Google deploy new models in minutes, whereas large financial services companies can take 18 months.

- Why delivered is not done – Many organizations have such a hard time deploying a model into production that the data scientists breathe a sigh of relief and move on to the next project. Yet, this neglects the critical process of monitoring to ensure the model performs as expected and is used appropriately.

- Measure everything, including yourself – Ironically, data scientists live in the world of measurement yet rarely turn that lens on themselves. Tracking patterns in aggregate workflows helps create modular templates and guide investment to in internal tooling and people to alleviate bottlenecks.

- Risk and change management aren’t just for consultants – Data science projects don’t usually fail because of the math, but rather because of the humans who use the math. Establish training, provide pre-determined feedback channels, and measure usage and engagement to ensure success.

Photo of Nick Elprin

Nick Elprin

Domino Data Lab

Nick Elprin is the CEO and co-founder of Domino Data Lab, a data science platform that accelerates the development and deployment of models while enabling best practices like collaboration and reproducibility. Before staring Domino, Nick build tools for quantitative researchers at Bridgewater, one of the world’s largest hedge funds. He has over a decade of experience working with data scientists at advanced enterprises. He has a BA and MS in computer science from Harvard.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)