Uber’s geospatial data is increasing exponentially as the company grows. As a result, its big data systems must also grow in scalability, reliability, and performance to support business decisions, user recommendations, and experiments for geospatial data. Zhenxiao Luo and Wei Yan explain how Uber runs geospatial analysis efficiently in its big data systems, including Hadoop, Hive, and Presto.
Zhenxiao and Wei start with an overview of Uber’s big data infrastructure before explaining how Uber models geospatial data and outlining its data ingestion pipeline. They then discuss geospatial query performance improvement techniques and experiences, focusing on geospatial data processing in big data systems, including Hadoop and Presto. Zhenxiao and Wei conclude by sharing Uber’s use cases and roadmap.
Zhenxiao Luo is leading Interactive Query Engines team at Twitter, where he focuses on Druid, Presto, Spark, and Hive. Before joining Twitter, Zhenxiao was running Interactive Analytics team at Uber. He has big data experience at Netflix, Facebook, Cloudera, and Vertica. Zhenxiao is Committer and Technical Steering Committee(TSC) member of Presto. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.
Wei Yan is a senior engineer at Uber, where he builds data processing and querying systems that scale along with Uber’s hypergrowth.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com