Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

How JupyterHub tamed big science: Experiences deploying Jupyter at a supercomputing center

Shreyas Cholia (Lawrence Berkeley National Laboratory), Rollin Thomas (Lawrence Berkeley National Laboratory), Shane Canon (Lawrence Berkeley National Laboratory)
11:05am–11:45am Thursday, August 24, 2017
JupyterHub deployments
Location: Nassau Level: Intermediate

Who is this presentation for?

  • System administrators, scientists, and systems engineers

Prerequisite knowledge

  • A basic understanding of scientific computing, HPC, or cluster computing

What you'll learn

  • Learn how NERSC leverages JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system

Description

Extracting scientific insights from data increasingly demands a richer, more interactive experience than traditional high-performance computing systems have traditionally provided. Shreyas Cholia, Rollin Thomas, and Shane Canon share their experience leveraging JupyterHub to enable notebook services for data-intensive supercomputing on the Cray XC40 Cori system at the National Energy Research Scientific Computing Center (NERSC).

Shreyas, Rollin, and Shane explain the motivation behind using Jupyter for supercomputing, describe their implementation strategy and the process behind the development of that strategy, and discuss lessons learned along the way. They also describe alternative configurations for Jupyter on Cori and outline the benefits and drawbacks of each.

The baseline setup incorporates a JupyterHub frontend web service running inside a Docker container (for portability and scaling) that manages user authentication and proxies subsequent Jupyter requests to the Cori system. Shreyas, Rollin, and Shane have developed a custom authenticator for JupyterHub called the GSI (Grid Security Infrastructure) Authenticator that allows users to acquire a grid certificate upon login. The service then uses a special spawner they developed (SSH Spawner), which spins up a Jupyter notebook on Cori via SSH using the GSI credentials. Once launched, the Jupyter notebook connects back to the hub over a websocket. The hub then proxies all future user requests to the Cori node via this websocket connection. Users interact with their notebooks running on Cori, launching preinstalled or custom kernels to analyze and visualize their data over a familiar web interface. A suite of SLURM “magic" commands developed at NERSC allows users to submit batch jobs from notebooks. The new authenticator, modified spawner, and magic commands have been contributed back to the open source Jupyter community.

As the number of Jupyter users on Cori grows, Shreyas, Rollin, and Shane expect severe resource limitations in a single-node deployment. The architecture they developed allows Jupyter notebooks to be spawned either on the dedicated Jupyter node or on Cori compute nodes directly. The dedicated-node setup provides users with immediate access to Jupyter at NERSC for smaller-scale analytics tasks, while the compute-node alternative provides them with more resources if they are willing to wait a bit in the queue. Launching notebooks on compute nodes is accomplished through the batch queue system using a customized SLURM-based BatchSpawner interface. This capability opens up Cori compute resources through Jupyter, including Cori data features like the burst buffer, and enables interactive analytics and visualization using thousands of cores on datasets that cannot fit into a single node’s memory footprint.

Beyond making Cori more accessible to more scientists, Jupyter allows NERSC to deliver interactive software packages and specialized kernels for tasks such as scalable analytics with Spark, real-time volume rendering and visualization with yt, and complex data analysis workflows with dask and ipyparallel. Shreyas, Rollin, and Shane demonstrate these frameworks in action and address specific challenges faced in deploying them on the Cray XC40 system.

Making Jupyter work seamlessly on Cori has required collaboration between data architects, systems engineers, security and network specialists, the core Jupyter team, and the extended Jupyter developer community. By documenting their experiences and plans and contributing back their code, Shreyas, Rollin, and Shane hope to promote and facilitate the concept of interactive supercomputing to a broader audience. Indeed, they envision a day when a good fraction of NERSC users rely exclusively on Jupyter or similar frameworks for data analysis and never use a traditional login shell at all.

Photo of Shreyas Cholia

Shreyas Cholia

Lawrence Berkeley National Laboratory

Shreyas Cholia leads the Usable Software Systems group at Lawrence Berkeley National Laboratory (LBNL), which focuses on making scientific computing more transparent and usable. He is particularly interested in how web APIs and tools can facilitate this. Shreyas also leads the science gateway, web, and grid efforts at the National Energy Research Scientific Computing Center (NERSC) at LBNL. His current work includes a project that enables Jupyter to interact with supercomputing resources, and NEWT, a REST API for high-performance computing. He holds a degree from Rice University, where he studied computer science and cognitive sciences.

Photo of Rollin Thomas

Rollin Thomas

Lawrence Berkeley National Laboratory

Rollin Thomas is a big data architect in the Data and Analytics Services group at Lawrence Berkeley National Laboratory. Previously, he was a staff scientist in the Computational Research division. Rollin has worked on numerical simulations of supernova atmospheres, observation and analysis of supernova spectroscopy data, and data management for supernova cosmology experiments. He has served as a member of the Nearby Supernova Factory, is a builder on the Dark Energy Survey, and is a full member of the Large Synoptic Survey Telescope Dark Energy Science Collaboration. Rollin holds a BS in physics from Purdue University and a PhD in astrophysics from the University of Oklahoma.

Photo of Shane Canon

Shane Canon

Lawrence Berkeley National Laboratory

Shane Canon is a project engineer in the Data and Analytics Services group at NERSC in the Lawrence Berkeley National Laboratory, where he focuses on enabling data-intensive applications on HPC platforms and engaging with bioinformatics applications. Shane has held a number of positions at NERSC, including leading the Technology Integration group, where he focused on the Magellan Project and other areas of strategic focus, leading the Data Systems group, and serving as a system administrator for the PDSF cluster, where he gained experience in cluster administration, batch systems, parallel filesystems, and the Linux kernel. He was also a group leader at Oak Ridge National Laboratory, where he architected the 10 petabyte Spider filesystem. Shane is involved in a number of projects outside of NERSC, including serving as the production lead on the KBase project, which is developing a platform to enable predictive biology. Shane holds a PhD in physics from Duke University and a BS in physics from Auburn University.