Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Dockerize and Kerberize JupyterHub Notebook for Spark, Yarn and HDFS

Moderated by: Joy Chakraborty

Who is this presentation for?

Data Engineer, Data Architect, Data Scientist, Devops engineer

Prerequisite knowledge

Some prior knowledge of Jupyter technologies and basic knowledge of HDFS and docker will help but not required.

What you'll learn

Setting up Docker for complex integration Working with Kerberos and KDC server Setting up JupyterHub for Spark running in Yarn cluster Securing multi-user Jupyter notebook

Description

This presentation will provide technical design and development insights in order to set up a Kerberize (secured) JupyterHub notebook for HDFS and Yarn (running Hive, Spark, etc.). Joy will show how Bloomberg set up the Kerberos-based notebook for Data Science community using Docker by integrating JupyterHub, Sparkmagic, and Levy. Sparkmagic provides the Spark kernel for R, Scala and Python. Livy is one of the most promising open source software to allow to submit Spark jobs over http-based REST interfaces. This presentation will highlight the capabilities of Jupyterhub, Sparkmagic and Livy, along with the gap and development required in order to make the notebook to work with Kerberized HDFS/Yarn cluster running Hive, Spark and other services. Docker minimizes the complex integration challenges involving networking and isolation which is essential for such project that will be covered in this presentation. No prior knowledge of any of these technologies is required in order to understand this presentation.