Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Deploying JupyterHub for students and researchers

Min Ragan-Kelley (Simula Research Laboratory), Carol Willing (Cal Poly San Luis Obispo), Yuvi Panda (Data Science Education Program (UC Berkeley)), Ryan Lovett (Department of Statistics, UC Berkeley)
9:00am–12:30pm Wednesday, August 23, 2017
JupyterHub deployments
Location: Concourse E Level: Intermediate
Average rating: **...
(2.00, 1 rating)

Who is this presentation for?

  • Instructors, system administrators, and IT support staff

Prerequisite knowledge

  • Experience deploying software on servers, including familiarity with issuing small commands in the terminal to deploy your servers

Materials or downloads needed in advance

  • A laptop with access to a Linux server or virtual machine
  • A Google account

What you'll learn

  • Explore JupyterHub’s architecture
  • Learn how to assemble a deployment exactly the way you want using Kubernetes

Description

JupyterHub, a multiuser server for Jupyter notebooks, enables you to offer a notebook server to everyone in a group—which is particularly useful when teaching a course, as students no longer need to install software on their laptops. Min Ragan-Kelley, Carol Willing, Yuvi Panda, and Ryan Lovett get you started deploying and customizing JupyterHub for your needs.

Min, Carol, Yuvi, and Ryan explore JupyterHub’s architecture and how to assemble a deployment exactly the way you want it. They then walk you through a complete deployment of JupyterHub with Kubernetes following best practices learned from the campus-wide deployment at UC Berkeley, including integration with campus authentication (via Google OAuth), status monitoring and data collection with statsd, and automating as much of the deployment and maintenance process as possible. Along the way, Min, Carol, Yuvi, and Ryan demonstrate how development, testing, and production systems can be easily managed to increase reliability and scaling of the deployment and how to customize your components (authenticators, spawners) and provide examples of services that can be managed by or interact with the hub and its users. You’ll then create a simple, functioning JupyterHub deployment of your own.

Topics include:

  • Authenticators that allow JupyterHub to integrate with any existing authentication system, such as GitHub or Google OAuth, PAM, and LDAP
  • Spawners, which are used to run notebook servers on any system for spawning processes, such as Docker, Kubernetes, or local processes
  • Proxies
  • Services

Outline

JupyterHub overview

  • Getting started with JupyterHub
  • Composing a deployment
    • What parts can I choose from? (Authenticator, spawner, proxy, single-user server image (if applicable))
    • How can I put them together?
  • Composing the simplest possible working deployment
  • Swapping out various parts from the simplest deployment
    • Using and configuring a different authenticator
    • Using and configuring a different spawner

Putting together a deployment of JupyterHub with Kubernetes

  • Background: Getting started with Kubernetes
  • Basic concepts
    • A mental model for working with Kubernetes
    • Pods, deployments, and services
    • Persistent volumes and cloud providers
    • Where to learn more (lecture series, books, wonderful tutorials, etc.)
  • Safely deploying JupyterHub by using standard release engineering practices
    • Deploying and upgrading a JupyterHub with helm
  • Building your own user images
  • How to debug your deployment when things inevitably go wrong

Appendix: More customization

  • Subclassing in jupyterhub_config.py for fun and profit
  • Writing your own authenticator
  • Writing your own spawner
Photo of Min Ragan-Kelley

Min Ragan-Kelley

Simula Research Laboratory

Min Ragan-Kelley is a postdoctoral fellow at Simula Research Lab in Oslo, Norway, where he focuses on developing JupyterHub, Binder, and related technologies and supporting deployments of Jupyter in science and education around the world. Min has been contributing to IPython and Jupyter since 2006 (full-time since 2013).

Photo of Carol Willing

Carol Willing

Cal Poly San Luis Obispo

Carol Willing is a research software engineer at Cal Poly San Luis Obispo working full-time on Project Jupyter, a Python Software Foundation fellow and former director, a Jupyter Steering Council member, a geek in residence at FabLab San Diego, where she teaches wearable electronics and software development, and an independent developer of open hardware and software. She co-organizes PyLadies San Diego and San Diego Python, contributes to open source community projects, including OpenHatch, CPython, Jupyter, and AnitaB.org’s open source projects, and is an active member of the MIT Enterprise Forum in San Diego. She enjoys sharing her passion for electronics, software, problem solving, and the arts. Previously, Carol worked in software engineering management, product and project management, sales, and the nonprofit sector. She holds an MS in management with an emphasis on applied economics and high-tech marketing from MIT and a BSE in electrical engineering from Duke University.

Photo of Yuvi Panda

Yuvi Panda

Data Science Education Program (UC Berkeley)

Yuvi Panda is infrastructure lead for the Data Science Education Program at UC Berkeley, where he works on scaling JupyterHub for use by thousands of students. A programmer and DevOps engineer, he wants to make it easy for people who don’t traditionally consider themselves programmers to do things with code and builds tools (Quarry, PAWS, etc.) to sidestep the list of historical accidents that constitute the “command-line tax” that people have to pay before doing productive things with computing. He’s a core member of the JupyterHub team and works on mybinder.org as well. Yuvi is also a Wikimedian, since you can check out of Wikimedia, but you can never leave.

Photo of Ryan Lovett

Ryan Lovett

Department of Statistics, UC Berkeley

Ryan Lovett manages research and instructional computing for the Department of Statistics at UC Berkeley and is a member of the Data Science Education Program’s infrastructure team. He is most often a sysadmin, though he also enjoys programming and consulting with faculty and students.

Comments on this page are now closed.

Comments

Andrey Petrin | ANALYST
08/24/2017 2:06am EDT

Hi! Unfortunately I was unable to visit your session yesterday. Would you be so kind to upload the slides and materials you were providing during the session?