Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Computable content: Notebooks, containers, and data-centric organizational learning

Paco Nathan (O'Reilly Media)
12:0512:45 Thursday, 25 May 2017
Level: Beginner
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Data scientists and decision makers within an organization

What you'll learn

  • Learn how to leverage notebooks, containers, and video together for improved learning across an organization (for example, understanding complex but crucial analytics)

Description

Computable content, first described by Lorena Barba at a 2015 lecture at the UC Berkeley Institute for Data Science, leverages Jupyter notebooks to make learning materials more powerful by integrating compute engines, data sources, etc. O’Reilly Media extended this approach to create the new Oriole Online Tutorial medium, publishing notebooks from authors along with video timelines. (A free public tutorial, Regex Golf, by Peter Norvig demonstrates what’s possible with this technology integration.) Each user session launches a Docker container on a Mesos cluster for fully personalized compute environments. The UX is entirely browser based. It is also instrumented for data collection and analytics for use as an assessment platform.

Project Jupyter supports more than 50 different compute environments. By leveraging Docker, additional frameworks (such as Dato) and data services can be added. By leveraging HTML on the frontend, JavaScript and other browser-based technologies can also be added to the mix. Vital portions of this software architecture have been released as Thebe on GitHub.

Paco Nathan explores the system architecture, shares project experiences, and considers the impact of notebooks for sharing and learning across a data-centric organization—How do notebooks help teams share and learn? What impact might notebooks have on developer collaboration that is currently focused on IDEs?

Topics include:

  • The system architecture based on Jupyter as middleware, plus Thebe, Docker, Mesos, NGINX, etc.
  • Data analytics and project experiences based on delivering computable content at scale
  • The supporting theory for this pedagogical approach, including Knuth’s literate programming
  • Media production techniques that use the video as subtext
Photo of Paco Nathan

Paco Nathan

O'Reilly Media

Paco Nathan leads the Learning Group at O’Reilly Media. Known as a “player/coach” data scientist, Paco led innovative data teams building ML apps at scale for several years and more recently was evangelist for Apache Spark, Apache Mesos, and Cascading. Paco has expertise in machine learning, distributed systems, functional programming, and cloud computing with 30+ years of tech-industry experience, ranging from Bell Labs to early-stage startups. Paco is an advisor for Amplify Partners and was cited in 2015 as one of the top 30 people in big data and analytics by Innovation Enterprise. He is the author of Just Enough Math, Intro to Apache Spark, and Enterprise Data Workflows with Cascading.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)