Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Making self-service data science a reality

Matt Brandwein (Cloudera), Tristan Zajonc (Cloudera)
14:5515:35 Thursday, 25 May 2017
Data science and advanced analytics
Location: Capital Suite 13
Level: Beginner
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Data scientists, data engineers, and enterprise architects

Prerequisite knowledge

  • Practical experience as a practicing data scientist, preferably using either Python or R, or an architect charged with supporting data scientists with a Hadoop-based platform

What you'll learn

  • Discover common problems to look out for when scaling data science on Hadoop
  • Learn best practices and new tools to overcome those challenges


There’s a lot of hype around data science in the enterprise. With Apache Hadoop and Apache Spark, it seems it should be easy for data scientists to use massive amounts of new data and compute to deliver better machine-learning models faster. But in reality, most data science still runs on a laptop, not on an enterprise data platform. The problem is the mismatch between typical enterprise requirements for a shared environment—security, governance, meeting job SLAs—and the practical needs of a data scientist, such as the ability to use popular R and Python packages, the freedom to customize the environment, and integration with versioning and scheduling tools.

As a result, enterprise data science and data platform teams often segregate, and both lose: models built on small data can still take months to deploy, while the resulting data silos increase both costs and security risks. Meeting this challenge is complex and requires a novel full stack approach, one that can meet the needs of both idiosyncratic data scientists and the platform teams who support them.

Matt Brandwein and Tristan Zajonc explore the common, specific, real-world technical challenges facing both audiences and discuss relevant improvements coming to the Hadoop ecosystem. Along the way, they cover best practices for configuring a data science environment and introduce new tools designed to make self-service data science a reality.

Photo of Matt Brandwein

Matt Brandwein


Matt Brandwein is director of product management at Cloudera, driving the platform’s experience for data scientists and data engineers. Before that, Matt led Cloudera’s product marketing team, with roles spanning product, solution, and partner marketing. Previously, he built enterprise search and data discovery products at Endeca/Oracle. Matt holds degrees in computer science and mathematics from the University of Massachusetts Amherst.

Photo of Tristan Zajonc

Tristan Zajonc


Tristan Zajonc is a senior engineering manager at Cloudera. Previously, he was cofounder and CEO of Sense, a visiting fellow at Harvard’s Institute for Quantitative Social Science, and a consultant at the World Bank. Tristan holds a PhD in public policy and an MPA in international development from Harvard and a BA in economics from Pomona College.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)