Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Scheduled notebooks: A means for manageable and traceable code execution

Matthew Seal (Netflix)
1:50pm–2:30pm Thursday, August 23, 2018
Extensions and customization, Jupyter subprojects, Usage and application
Location: Sutton Center/Sutton South Level: Intermediate

Who is this presentation for?

  • Notebook developers, platform and systems engineers, and data scientists

Prerequisite knowledge

  • A basic understanding of Python, notebooks, and cron (useful but not required)
  • Experience working with established cloud ecosystems (useful but not required)

What you'll learn

  • Learn how to use and extend papermill, a way of simplifying scheduled code execution, and some helpful patterns for scaling complex systems

Description

Matthew Seal explores notebooks as a unifying mechanism for developing, tracking, and debugging small units of work that need to be managed and scheduled and demonstrates how papermill, an nteract tool, can be used to execute notebooks as immutable pieces of code. Matthew explains how this tooling makes notebooks a solid choice for templates in scheduled processes and shares how Netflix is using this pattern to colocate tasks written by users ranging from nonprogrammers to professional system maintainers.

This technology choice and its application stems from a desire to help solve a fundamental problem found in many large code ecosystems. As development environments grow and expand to include more tools, more languages, and more flexibility, it often becomes increasingly difficult to maintain a few simple interfaces that can take advantage of these systems. The task of executing a piece of code within such an ecosystem changes from a single point of entry to many dissimilar and constrained entry points. Learning each of these can be tedious and is a major barrier to entry for new users.

The goal of showing notebooks as traceable units that can be referenced to point-in-time execution is to help alleviate this pain. Matthew details how Netflix targets similar working environments between local development and scheduled tasks without leaving a Jupyter client. When an error occurs in scheduled work, you can debug the problem in the same way you’d debug a local problem. You’ll see some examples of this pattern when pulling failed notebooks from a scheduler and fixing the problems without needing to interact with the intervening technologies.

Photo of Matthew Seal

Matthew Seal

Netflix

Matthew Seal is a senior software engineer at Netflix, where he works on scaling data platform solutions. Based in the Bay Area of California, Matthew attended Stanford University for undergraduate and graduate school. He stayed in the area, working at startups and spending a long stretch of time working at OpenGov.