Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Computable content: Notebooks, containers, and data-centric organizational learning

Paco Nathan (derwen.ai)
12:0512:45 Thursday, 25 May 2017
Level: Beginner
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Data scientists and decision makers within an organization

What you'll learn

  • Learn how to leverage notebooks, containers, and video together for improved learning across an organization (for example, understanding complex but crucial analytics)

Description

Computable content, first described by Lorena Barba at a 2015 lecture at the UC Berkeley Institute for Data Science, leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based. It is also instrumented for data collection and analytics for use as an assessment platform.

Project Jupyter supports more than 50 different compute environments. By leveraging Docker, additional frameworks (such as Dato) and data services can be added. By leveraging HTML on the frontend, JavaScript and other browser-based technologies can also be added to the mix. Vital portions of this software architecture have been released as Thebe on GitHub.

Paco Nathan explores the system architecture, shares project experiences, and considers the impact of notebooks for sharing and learning across a data-centric organization—How do notebooks help teams share and learn? What impact might notebooks have on developer collaboration that is currently focused on IDEs?

Topics include:

  • The system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, NGINX, etc.
  • Data analytics and project experiences based on delivering computable content at scale
  • The supporting theory for this pedagogical approach, including Knuth’s literate programming
  • Media production techniques that use the video as subtext
Photo of Paco Nathan

Paco Nathan

derwen.ai

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly Media and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of JupyterCon and an advisor for Amplify Partners, Deep Learning Analytics, and Recognai. He was named one of the top 30 people in big data and analytics in 2015 by Innovation Enterprise.