Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Real-time analytics at Uber: Bring SQL into everything

Zhenxiao Luo (Twitter)
2:40pm3:20pm Wednesday, March 27, 2019
Average rating: ****.
(4.09, 11 ratings)

Who is this presentation for?

  • Software engineers and data analysts

Level

Intermediate

Prerequisite knowledge

  • A basic understanding of big data

What you'll learn

  • Explore how Uber uses big data, machine learning, and SQL analytics

Description

From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences.

Uber’s analysts and engineers wanted to run real-time analytics with deep learning models. But copying data from one source to another is pretty expensive

Zhenxiao Luo explains how Uber supports real-time analytics with deep learning on the fly, without any data copying. He starts with the company’s big data infrastructure, specifically Hadoop, Spark, and Presto, and discusses how Uber uses Presto as an interactive SQL engine and deployed Hadoop Distributed File System, Pinot, MySQL, and Elasticsearch as storage solutions. He then details how Uber built a Presto Elasticsearch connector from scratch to support real-time analytics on heterogeneous data. He concludes by sharing the company’s production experience and roadmap.

Photo of Zhenxiao Luo

Zhenxiao Luo

Twitter

Zhenxiao Luo is leading Interactive Query Engines team at Twitter, where he focuses on Druid, Presto, Spark, and Hive. Before joining Twitter, Zhenxiao was running Interactive Analytics team at Uber. He has big data experience at Netflix, Facebook, Cloudera, and Vertica. Zhenxiao is Committer and Technical Steering Committee(TSC) member of Presto. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.