Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Your 10 billion rides are arriving now: Scaling Apache Spark for data pipelines and intelligent systems at Uber

Felix Cheung (Uber)
11:20am–12:00pm Wednesday, 09/12/2018
Data engineering and architecture
Location: 1A 10 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines, Transportation and Logistics
Average rating: ****.
(4.60, 5 ratings)

Who is this presentation for?

  • Data engineers and data scientists

Prerequisite knowledge

  • A basic understanding of big data platforms and Apache Spark

What you'll learn

  • Learn how Uber built its data platform with Apache Spark at enormous scale
  • Explore intelligent system use cases built on Apache Spark at Uber
  • Understand the challenges running open source big data systems at this scale

Description

Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame.

Felix begins at the beginning, explaining how Uber built out its emerging big data platform. Felix looks at how the data stack has evolved to chase the explosive growth in the last few years and inspects the latest overall architecture, diving into the current internal service and tooling offerings, including a few pipeline-as-a-service implementations. Throughout, he highlights the role Apache Spark plays in the platform.

Felix then walks you through a few intelligent systems built on this data platform, exploring the design choices made to empower distributed machine learning at scale and adapt it to Uber’s distinctive setting. Felix concludes by analyzing a few unique challenges with reliability, resource utilization, and observability at this volume and scale and shares lessons learned building a data platform at such an enormous scale. Along the way, Felix details best practices for developing, testing, validating, and deploying to production in a multi-data-center environment and explains how Uber juggles business reality and the idealism of free and open source software (FOSS) to overcome the hurdle in engaging the open source community.

Photo of Felix Cheung

Felix Cheung

Uber

Felix Cheung is an engineer at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.

Comments on this page are now closed.

Comments

Picture of Felix Cheung
Felix Cheung | ENGINEER
09/22/2018 11:03pm EDT

(please contact me..)

shruti Modi | SENIOR MANAGER DATA PLATFORM
09/12/2018 7:25am EDT

hi can I have access to this presentation?