From determining the most convenient rider pickup points to predicting the fastest routes, Uber uses data-driven analytics to create seamless trip experiences.
Uber’s analysts and engineers wanted to run real-time analytics with deep learning models. But copying data from one source to another is pretty expensive
Zhenxiao Luo explains how Uber supports real-time analytics with deep learning on the fly, without any data copying. He starts with the company’s big data infrastructure, specifically Hadoop, Spark, and Presto, and discusses how Uber uses Presto as an interactive SQL engine and deployed Hadoop Distributed File System, Pinot, MySQL, and Elasticsearch as storage solutions. He then details how Uber built a Presto Elasticsearch connector from scratch to support real-time analytics on heterogeneous data. He concludes by sharing the company’s production experience and roadmap.
Zhenxiao Luo is an engineering manager at Uber, where he runs the interactive analytics team. Previously, he led the development and operations of Presto at Netflix and worked on big data and Hadoop-related projects at Facebook, Cloudera, and Vertica. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org