Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

PayPal Notebooks: Data science and machine learning at scale, powered by Jupyter

Romit Mehta (PayPal), Praveen Kanamarlapudi (PayPal)
4:10pm–4:50pm Friday, August 24, 2018
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data infrastructure teams, data engineers, data scientists, and analysts

Prerequisite knowledge

  • A basic understanding of the Jupyter ecosystem

What you'll learn

  • Discover how PayPal made Jupyter a first-class citizen in its data and analytics ecosystem
  • Explore the features PayPal implemented alongside Jupyter to simplify user experience and keep the product 100% compliant with infosec guidelines while adding innovative features that improve data scientist and analyst time to market

Description

PayPal Notebooks powered by Jupyter is new major ecosystem for data analytics and exploration at PayPal, with kernels, magics, and utilities for analytics and engineering. Hundreds of PayPal’s data scientists, analysts, and developers use Jupyter to access data spread across filesystem, relational, document, and key-value stores, enabling complex analytics and an easy way to build, train, and deploy machine learning models. Romit Mehta and Praveen Kanamarlapudi explain how PayPal built its Jupyter infrastructure and powerful extensions.

Topics include:

  • High availability: A grid of highly available servers ensures customers get a notebook instance immediately
  • PPMagics (PayPal Magics): An extension containing all PayPal magics to connect to any storage using any compute
  • Parameterized notebooks: An extension to make notebooks more interactive
  • Notebooks scheduling: An integration with Airflow to schedule notebooks
  • GitHub integration: An integration with PayPal’s enterprise GitHub repository to share and collaborate with notebooks
  • One-touch kernels: A simplified way to access any compute or storage
  • Unified Data API: An integration with PayPal’s unified data API to access almost any storage engine directly within notebooks
Photo of Romit Mehta

Romit Mehta

PayPal

Romit Mehta is a product manager at PayPal focusing on core big data and analytics platform products, which include a compute framework, a data platform, and a notebooks platform. In this role, Romit is working to simplify application development on big data technologies like Spark and improve analysts’ and data scientists’ agility and ease their access to data spread across a multitude of data stores via friendly technologies like SQL and notebooks. In his 19-year career, Romit has built data and analytics solutions for a wide variety of companies across the networking, semiconductor, telecom, security, and fintech industries. Outside of data products, Romit spends his time with his wife Kosha and their two wonderful kids, Annika and Vedant.

Photo of Praveen Kanamarlapudi

Praveen Kanamarlapudi

PayPal

Praveen Kanamarlapudi is a senior software engineer on the core data platform team at PayPal, where he builds scalable and distributed platforms, including a highly available Jupyter platform that is being used by hundreds of the company’s data scientists, analysts, and developers. He’s also a contributor to Livy and Sparkmagic.