Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Data science governance: What and how

Andy Petrella (Kensu)
16:3517:15 Thursday, 25 May 2017
Enterprise adoption
Location: Capital Suite 15/16
Level: Non-technical
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data scientists, managers, and C-level leaders

Prerequisite knowledge

  • A general understanding of data science workflows and the major pain points

What you'll learn

  • Understand how data science can be more productive with a data-science-on-data-science approach, using behavioral data combined with static and runtime metadata of processes to reduce boilerplates and pain points


Data science for enterprise use cases explodes the number of intermediate datasets. Thus, one of upcoming challenges is to find a way into these ever-growing data sources. Andy Petrella proposes a data-science-on-data-science approach, using behavioral data combined with static and runtime metadata of processes.

Andy explores the well-known problems of doing data science, like finding the right data, connecting to it, and figuring out the content, provenance, and all the contextual information you need before reading a dataset. Nowadays, data science platform use distributed technologies because the amount of data is increasing; hence the process is more expensive.

Andy emphasizes the link existing between people in an enterprise, data, and processes and dives into how to collect information in a organic manner (dynamically and implicitly), using a combination of notebook and harvester system.

Photo of Andy Petrella

Andy Petrella


Andy Petrella is the CEO of Kensu, where he also gets his hands dirty in Adalog’s code. Andy is a mathematician turned distributed computing entrepreneur. Besides being a Scala/Spark trainer, Andy participated in many projects built using Spark, Cassandra, and other distributed technologies in various fields including geospatial analysis, the IoT, and automotive and smart cities projects. Andy is the creator of the Spark Notebook, the only reactive and fully Scala notebook for Apache Spark. In 2015, Andy cofounded Data Fellas with Xavier Tordoir around their product the Agile Data Science Toolkit, which facilitates the productization of data science projects and guarantees their maintainability and sustainability over time. Andy is also member of the program committee for the O‚ÄôReilly Strata, Scala eXchange, Data Science eXchange, and Devoxx events.