Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Science at the Speed of Thought: Enhancing Jupyter to Enable Interactive "Human-in-the-loop" Supercomputing

Moderated by: Matt Henderson and Shreyas Cholia

Who is this presentation for?

Researchers, Data Scientists and Research Software Engineers

Prerequisite knowledge

Knowledge of the basics of Jupyter notebooks, Jupyter kernels, Jupyter widgets should be sufficient.

What you'll learn

What the current challenges are for HPC systems, current state of the art in the Jupyter ecosystem with respect to those challenges, and how interactive HPC could be integrated into a future Jupyter ecosystem.

Description

High Performance Computing (HPC) systems and workflows process and analyze data produced by large-scale experiments and simulations, such as first-principles materials structure calculations, supernovae simulations, and mass spectrometry image analysis. These largely focus on a non-interactive, asynchronous, batch execution process that can use thousands of cores and run for hours to days. Historically, these systems have not been designed to maximize the human utility and ease of interactive use, but rather to optimize for raw performance.

Simplifying and accelerating the mode of experimentation, which aligns with how scientists think and operate, is key to enhancing their productivity. This includes easy job submission and resubmission, introspection of jobs and their contents as they run, and easy ways of intercepting and manipulating data inputs and outputs for analysis and chaining of operations into pipelines. Introducing interactivity to scientific HPC applications and workflows provides a key missing human-in-the-loop capability to inspect the state of an execution in real time. The Jupyter architecture (kernels, Notebooks, widgets, etc.) provides a solid foundation to address these challenges, and is already familiar to many scientists, but is missing certain key ingredients. Our work has been centered around extending the Jupyter platform to facilitate an interactive HPC experience for scientists. Our vision is to provide an interactive Jupyter based-system that makes working on your laptop and working on a supercomputer a seamless experience that yields the best of both worlds, effectively “bringing supercomputing to your laptop.” We will motivate and demonstrate our work with real use cases from major science projects that need human-in-the-loop interaction with their large jobs and workflows. We will illustrate the power of coupling Jupyter with HPC systems and discuss how our system addresses some of the problems scientists face today, including:
  • Interacting with a large HPC job in real time, sampling/querying the job for specific information or data slices through a Jupyter notebook
  • Seamlessly moving data back and forth between Jupyter infrastructure and the HPC system
  • Managing notebooks with both synchronous local operations and asynchronous tasks that run on the supercomputing cluster
  • Interaction with workflow managers, batch systems and job schedulers