Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

ML and AI at scale at PayPal

Subhadra Tatavarti (PayPal), Chen Kovacs (Paypal)
11:00am11:40am Thursday, March 28, 2019
Average rating: ****.
(4.12, 8 ratings)

Who is this presentation for?

  • Architects, engineers, data scientists, and data engineers



Prerequisite knowledge

  • A basic understanding of data science, AI, data analytics, SQL, Spark, PySpark, and the Jupyter Notebook

What you'll learn

  • Explore PayPal's unified interoperable ecosystem of products for increased data discoverability, decreased data latency, increased data access (and standardized access patterns) via a data framework layer, and a powerful interactive data development environment providing a context-based data applications development platform (that also reduces TTM)


The PayPal data ecosystem is fairly large, with 250+ PB of data on a polygot ecosystem of data stores transacting in 200+ countries, supported by some of the largest installations of Oracle, Hortonworks, and Aerospike clusters. Given this massive scale and complexity, discovering and accessing the right datasets, being able to secure this data at the desired latency, and a frictionless development environment for data analysts and data scientists became imperative.

Subhadra Tatavarti and Chen Kovacs explain how PayPal’s data platform team is helping solve this problem holistically with a combination of a self-service data integration platform—which consists of a data integration layer that moves data at scale, a data framework layer using Gimel, a single unified API that can be used to access data stored on any data store supported, and a frictionless IDE environment that brings all these together—and PayPal’s customized Jupyter notebooks environment known as PPNotebooks. PayPal Notebooks takes the versatility and power of Jupyter and enhances it for enterprises with features like one-click access to any Hadoop environment, built-in scheduling using Apache Airflow, collaboration and sharing with seamless integration with GitHub, and native publishing to Tableau.

Photo of Subhadra Tatavarti

Subhadra Tatavarti


Subhadra Tatavarti leads strategy and product for data platforms and infrastructure at PayPal. Her team manages and propels the data platforms that power PayPal’s core customers, processing over 250 PB of data, and builds products that cater to over 5,000 PayPal developers, analysts, and data scientists—with the goal to not just enable this community but also drive efficiency, reduce friction, and reduce time to market, which in turn drives PayPal’s growth. Subhadra is an experienced leader of large organization-wide transformations that drive innovation and accelerate business delivery.

Chen Kovacs