Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Your 10 billion rides are arriving now: Scaling Apache Spark for data pipelines and intelligent systems at Uber

Felix Cheung (Uber)

11:20am–12:00pm Wednesday, 09/12/2018

Data engineering and architecture
Location: 1A 10 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines, Transportation and Logistics

Average rating:

(4.60, 5 ratings)

Who is this presentation for?

Data engineers and data scientists

Prerequisite knowledge

A basic understanding of big data platforms and Apache Spark

What you'll learn

Learn how Uber built its data platform with Apache Spark at enormous scale
Explore intelligent system use cases built on Apache Spark at Uber
Understand the challenges running open source big data systems at this scale

Description

Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame.

Felix begins at the beginning, explaining how Uber built out its emerging big data platform. Felix looks at how the data stack has evolved to chase the explosive growth in the last few years and inspects the latest overall architecture, diving into the current internal service and tooling offerings, including a few pipeline-as-a-service implementations. Throughout, he highlights the role Apache Spark plays in the platform.

Felix then walks you through a few intelligent systems built on this data platform, exploring the design choices made to empower distributed machine learning at scale and adapt it to Uber’s distinctive setting. Felix concludes by analyzing a few unique challenges with reliability, resource utilization, and observability at this volume and scale and shares lessons learned building a data platform at such an enormous scale. Along the way, Felix details best practices for developing, testing, validating, and deploying to production in a multi-data-center environment and explains how Uber juggles business reality and the idealism of free and open source software (FOSS) to overcome the hurdle in engaging the open source community.

Felix Cheung

Uber

Felix Cheung is an engineer at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.

Comments on this page are now closed.

Comments

Felix Cheung | ENGINEER

09/22/2018 11:03pm EDT

(please contact me..)

Shruti Modi | SENIOR MANAGER DATA PLATFORM

09/12/2018 7:25am EDT

hi can I have access to this presentation?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com