Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

Scalable Data Science: Using Jupyter with BeakerX and GPUs on DC/OS

Who is this presentation for?

Data Scientist, Developers, Architects interested in scalable ways to deploy Jupyter Notebooks.

Prerequisite knowledge

Basic understanding of Jupyter notebooks is recommended.

What you'll learn

JupyterLab provides a complete, simple to set up workspace for data scientists. DC/OS provides a platform to provision Jupyter notebooks in a self-service, multi-tenant environment.

Description

JupyterLab Notebooks, especially with beakerX extensions provide a complete environment for data scientists to develop their models, but there are Infrastructure challenges that they must first surmount:

  • How can one provision Notebooks in a self-service manner with different requirements (imagine a data scientist requesting that a Notebook be provisioned with access to two GPUs and SSD-based storage).
  • How can we utilize a large and expensive compute cluster most efficiently between multiple data scientists and production workloads?
  • How can we move models from the exploratory analytics phase into a production-grade training and serving phase?

In this talk we will discuss a potential solution to these challenges with platform to provision Jupyter notebooks in a self-service, multi-tenant environment. By using Apache Mesos/DC/OS as the underlying cluster manager, we are able to keep the overall system very flexible (e.g., allowing simple provision of a range of Big Data Frameworks such as Apache Spark, Flink, or Cassandra) but still make very efficient use of the underlying infrastructure.