There’s a lot of hype around data science in the enterprise. With Apache Hadoop and Apache Spark, it seems it should be easy for data scientists to use massive amounts of new data and compute to deliver better machine-learning models faster. But in reality, most data science still runs on a laptop, not on an enterprise data platform. The problem is the mismatch between typical enterprise requirements for a shared environment—security, governance, meeting job SLAs—and the practical needs of a data scientist, such as the ability to use popular R and Python packages, the freedom to customize the environment, and integration with versioning and scheduling tools.
As a result, enterprise data science and data platform teams often segregate, and both lose: models built on small data can still take months to deploy, while the resulting data silos increase both costs and security risks. Meeting this challenge is complex and requires a novel full stack approach, one that can meet the needs of both idiosyncratic data scientists and the platform teams who support them.
Matt Brandwein and Tristan Zajonc explore the common, specific, real-world technical challenges facing both audiences and discuss relevant improvements coming to the Hadoop ecosystem. Along the way, they cover best practices for configuring a data science environment and introduce new tools designed to make self-service data science a reality.
Matt Brandwein is director of product management at Cloudera, driving the platform’s experience for data scientists and data engineers. Before that, Matt led Cloudera’s product marketing team, with roles spanning product, solution, and partner marketing. Previously, he built enterprise search and data discovery products at Endeca/Oracle. Matt holds degrees in computer science and mathematics from the University of Massachusetts Amherst.
Tristan Zajonc is a senior engineering manager at Cloudera. Previously, he was cofounder and CEO of Sense, a visiting fellow at Harvard’s Institute for Quantitative Social Science, and a consultant at the World Bank. Tristan holds a PhD in public policy and an MPA in international development from Harvard and a BA in economics from Pomona College.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.