Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Deploying a scalable JupyterHub environment for running Jupyter notebooks

Graham Dumpleton (Red Hat)
5:05pm5:45pm Thursday, December 7, 2017
Average rating: ***..
(3.00, 2 ratings)

Who is this presentation for?

  • Educators, researchers, system administrators, and IT operations support staff

Prerequisite knowledge

  • Experience or familiarity with the complexities of installing web applications at scale

What you'll learn

  • Explore JupyterHub and learn how to deploy it with OpenShift

Description

The JupyterHub application can be used to create a centralized web based environment into which users can log in and get access to their own instance of a Jupyter notebook without requiring them to install any software on their own local computer. This ensures that all users have access to the same environment, and their instance can also be prepopulated with any notebooks or data files.

Running a Jupyter Notebook, or even a JupyterHub instance, on a single computer is easy enough, but when you have hundreds of users a single machine is not going to be sufficient to handle everything. JupyterHub provides a pluggable system for spawning Jupyter notebooks and plugins exist for distributing Jupyter notebook instances across multiple machines, but setting up and maintaining the dedicated infrastructure for these can be complicated.

Graham Dumpleton demonstrates how to use OpenShift, an enterprise distribution of Kubernetes and general purpose environment designed for deploying web applications at scale across a cluster of machines, and JupyterHub to run a highly scalable environment for hosting Jupyter notebooks in education and business. Along the way, Graham offers an overview of Kubernetes and OpenShift and discusses the advantages of using them over attempting to build out a system yourself from scratch.

Topics include:

  • A review of existing methods for deploying JupyterHub on Kubernetes
  • Using features of OpenShift to simplify the deployment of JupyterHub
  • Challenges when scaling up JupyterHub for a large number of users
  • Setting limits on the amount of resources that a Jupyter notebook can use
  • Using persistent storage to ensure any work done by users is not lost
  • Using external authentication providers to control access to JupyterHub
  • Securing Jupyter Notebook instances to limit the access the user has
  • Methods for preparing the Jupyter Notebook images run by JupyterHub
  • Self-service deployment and administration of a JupyterHub instance
Photo of Graham Dumpleton

Graham Dumpleton

Red Hat

Graham Dumpleton is a developer advocate for OpenShift at Red Hat. Graham is the author of mod_wsgi, a popular module for hosting Python web applications with the Apache HTTPD web server. He has a keen interest in Docker and platform-as-a-service (PaaS) technologies. Graham is a fellow of the Python Software Foundation and an emeritus member of the Apache Software Foundation.