O'Reilly、Cloudera 主办
Make Data Work

列式存储在Uber (Columnar storage at Uber)

此演讲使用中文 (This will be presented in Chinese)

Zhenxiao Luo (Twitter)
14:50–15:30 Saturday, 2017-07-15
数据工程和架构 (Data engineering and architecture)
地点: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): 非技术性 (Non-technical)

您将学到什么 (What you'll learn)

Learn columnar storage, concepts, techniques, and query optimizations

描述 (Description)




As Uber continues to grow, its big data systems must also grow in scalability, reliability, and performance to help Uber make business decisions, give user recommendations, and analyze experiments across all data sources. Zhenxiao Luo shares his experience running columnar storage in production at Uber and discusses query optimization techniques in SQL engines.

Uber’s Hadoop warehouse uses columnar storage with Parquet as the default file format, Presto as its interactive query engine, and Hive and Spark as the batch engines. Zhenxiao explains how Uber developed a number of performance optimizations for columnar storage in all of these query engines to achieve much better performance for customers, including nested column pruning, predicate pushdown, dictionary pushdown, columnar reads, and lazy reads, achieving a more than 5x performance improvement in all query engines.

Photo of Zhenxiao Luo

Zhenxiao Luo


Zhenxiao Luo is leading Interactive Query Engines team at Twitter, where he focuses on Druid, Presto, Spark, and Hive. Before joining Twitter, Zhenxiao was running Interactive Analytics team at Uber. He has big data experience at Netflix, Facebook, Cloudera, and Vertica. Zhenxiao is Committer and Technical Steering Committee(TSC) member of Presto. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.



WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2


ORB Data Site