Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Explorations in reproducible analysis with Nodebook

Kevin Zielnicki (Stitch Fix)
2:40pm–3:20pm Friday, August 24, 2018
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data scientists, researchers, and Python developers

Prerequisite knowledge

  • Familiarity with the Jupyter Notebook

What you'll learn

  • Learn how to do reproducible analysis with Nodebook

Description

Tools like the Jupyter Notebook provide an excellent platform for quickly iterating on an analysis by interleaving code, text, and output. However, the flexibility of the notebook environment can also lend itself to code that, over the course of an analysis, becomes increasingly unwieldy and difficult to rerun or meaningfully build upon.

While the notebook model allows users to develop code and share results quickly, the prioritization of quick exploration can make the analyses difficult to reproduce. This is typically fixed in a final “clean-up” phase where a notebook is pared down and rerun to make sure it is logically consistent. However, this takes extra effort, and many analysis artifacts will never reach this state. To help address this problem before it happens, we can build tools to make reproducible analysis the most natural option.

As a step toward encouraging reproducibility, Kevin Zielnicki offers an overview of Nodebook, an extension to the Jupyter Notebook that imposes constraints on the notebook model in exchange for greater consistency while keeping the exploratory interactivity that makes the notebook model so useful. Nodebook does this by maintaining a chain of cell execution in logical rather than temporal order. This contrasts with the standard notebook model, in which cells affect the global notebook state in order of execution independently of their logical position in the notebook. By enforcing logical consistency with each cell execution, reproducibility is no longer delayed to a final clean-up but rather maintained throughout the analysis.

Photo of Kevin Zielnicki

Kevin Zielnicki

Stitch Fix

Kevin Zielnicki is a data scientist on the styling algorithms team at Stitch Fix. Kevin holds a PhD in physics in the field of quantum information processing, but he now enjoys working with data that can be observed without changing its value.