Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

A Roadmap for Open Data Science

Thomas Dinsmore (Cloudera)
14:0514:45 Thursday, 24 May 2018

Who is this presentation for?

Analytics and Data Science Leaders: CDOs, CAOs, CTOs; VPs of Data, Analytics or Data Science; Directors of Analytics or Data Science; Data Science Team Leaders; Chief Architects

Prerequisite knowledge

Basic understanding of commercial and open source data science tools. Basic understanding of enterprise software provisioning and deployment.

What you'll learn

Every organization seeking to leverage machine learning should develop a culture of open data science. However, this is difficult to do. There are proven best practices that organizations can adopt to ease the transition and ensure success.


Data science transforms the organization. Working data scientists prefer to use open source software, such as Python, R, and Apache Spark, for many reasons:

- Comprehensive functionality
- Flexibility and extensibility
- Transparency
- Innovation

Open source software can scale to support the needs of large enterprises at an acceptable cost. However, many organizations have a large footprint of legacy analytics software. Executives in these organizations struggle to manage the growing cost to provision this software, and to encourage users to adopt open source tooling.

Migration to open data science is challenging for several reasons:

- Existing users of legacy software often have strong personal preferences, and resist switching
- Programs written with legacy software must be rebuilt in new tools
- Data may be siloed within the legacy platform

Complicating matters, commercial software vendors use community-building techniques to cultivate loyalty among end users.

Nevertheless, we see organizations successfully transition to a culture of open data science. This makes it possible for us to identify best practices that others can use. They include:

- Understanding the needs of users
- Aligning software (commercial or open source) to actual user needs
- Avoiding duplication and overlicensing
- Options for code migration and rebuilding
- Eliminating data silos
- The most effective way to train and retrain users

We close the presentation with a discussion of keys to success in building an open data science culture. They include such things as executive leadership, cost transparency, and clear metrics of user adoption and success with open data science tools.

Photo of Thomas Dinsmore

Thomas Dinsmore


Thomas W. Dinsmore is director of product marketing for Cloudera Data Science. Previously, he served as a knowledge expert on the strategic analytics team at the Boston Consulting Group; director of product management for Revolution Analytics; analytics solution architect at IBM Big Data Solutions; and a consultant at SAS, PricewaterhouseCoopers, and Oliver Wyman. Thomas has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citibank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, and Vodafone. His international experience includes work for clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, the United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia, and Singapore.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)