Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Computable content: Notebooks, containers, and data-centric organizational learning

Paco Nathan (
4:15pm–4:55pm Wednesday, December 7, 2016
Becoming a data-centric company
Location: 310/311 Level: Non-technical
Average rating: **...
(2.00, 1 rating)

What you'll learn

  • Learn how to leverage notebooks, containers, and video together for improved learning across an organization (for example, understanding complex but crucial analytics)


Computable content, first described by Lorena Barba at a 2015 lecture at the UC Berkeley Institute for Data Science, leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole online tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based. It is also instrumented for data collection and analytics for use as an assessment platform.

Project Jupyter supports more than 50 different compute environments. By leveraging Docker, additional frameworks (such as Dato) and data services can be added. By leveraging HTML on the frontend, JavaScript and other browser-based technologies can also be added to the mix. Vital portions of this software architecture have been released as Thebe on GitHub.

Paco Nathan explores the system architecture, shares project experiences, and considers the impact of notebooks for sharing and learning across a data-centric organization—How do notebooks help teams share and learn? What impact might notebooks have on developer collaboration that is currently focused on IDEs?

Topics include:

  • The system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, Nginx, etc.
  • Data analytics and project experiences based on delivering computable content at scale
  • The supporting theory for this pedagogical approach, including Knuth’s literate programming
  • Media production techniques that use the video as subtext
Photo of Paco Nathan

Paco Nathan

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.