San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Your 10 billion rides are arriving now: Scaling Apache Spark for data pipelines and intelligent systems at Uber

Felix Cheung (Uber)

17:25–18:05 Wednesday, 1 May 2019

Data Engineering and Architecture
Location: S11 A

Secondary topics: Data Platforms, Transportation and Logistics

Average rating:

(4.42, 12 ratings)

Who is this presentation for?

Data engineers and data scientists

Level

Intermediate

Prerequisite knowledge

A basic understanding of big data platforms and Apache Spark

What you'll learn

Learn how Uber built its data platform with Apache Spark at enormous scale
Explore intelligent system use cases built on Apache Spark at Uber
Understand the challenges of running Apache Spark at this scale

Description

Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame.

Felix begins at the beginning, explaining how Uber built out its emerging big data platform. Felix looks at how the data stack has evolved to chase the explosive growth in the last few years and inspects the latest overall architecture, diving into the current internal service and tooling offerings, including a few pipeline-as-a-service implementations. Throughout, he highlights the role Apache Spark plays in the platform.

Felix then walks you through a few intelligent systems built on this data platform, exploring the design choices made to empower distributed machine learning at scale and adapt it to Uber’s distinctive setting. Felix concludes by analyzing a few unique challenges with reliability, resource utilization, and observability at this volume and scale and shares lessons learned building a data platform at such an enormous scale.

Felix Cheung

Uber

Felix Cheung is an engineer at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com