Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Your 10 billion rides are arriving now: Scaling Apache Spark for data pipelines and intelligent systems at Uber

Felix Cheung (Uber)
17:2518:05 Wednesday, 1 May 2019
Average rating: ****.
(4.42, 12 ratings)

Who is this presentation for?

Data engineers and data scientists

Level

Intermediate

Prerequisite knowledge

A basic understanding of big data platforms and Apache Spark

What you'll learn

  • Learn how Uber built its data platform with Apache Spark at enormous scale
  • Explore intelligent system use cases built on Apache Spark at Uber
  • Understand the challenges of running Apache Spark at this scale

Description

Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame.

Felix begins at the beginning, explaining how Uber built out its emerging big data platform. Felix looks at how the data stack has evolved to chase the explosive growth in the last few years and inspects the latest overall architecture, diving into the current internal service and tooling offerings, including a few pipeline-as-a-service implementations. Throughout, he highlights the role Apache Spark plays in the platform.

Felix then walks you through a few intelligent systems built on this data platform, exploring the design choices made to empower distributed machine learning at scale and adapt it to Uber’s distinctive setting. Felix concludes by analyzing a few unique challenges with reliability, resource utilization, and observability at this volume and scale and shares lessons learned building a data platform at such an enormous scale.

Photo of Felix Cheung

Felix Cheung

Uber

Felix Cheung is an engineer at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.