O'Reilly、Cloudera 主办
Make Data Work

为Hadoop上的大数据准备的统一的SQL (Unified SQL for big data on Hadoop)

此演讲使用中文 (This will be presented in Chinese)

Xuefu Zhang (Uber)
16:20–17:00 Saturday, 2017-07-15
数据工程和架构 (Data engineering and architecture)
地点: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): Intermediate
平均得分:: *....
(1.00, 1 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of SQL, Hadoop, and big data concepts

您将学到什么 (What you'll learn)

Explore U-SQL, which was developed internally by engineers at Uber and is envisioned as the future of SQL platforms

描述 (Description)

在Uber,我们非常依赖从收集到的大数据里学习司机和出行的信息,并用来做每日的商业决策。诸如Apache Hive和Presto这样的SQL工具被用于不同的业务场景,也包括像Vertica这样的旧有数据仓库。这些工具提供了相似但是不同的SQL句法,因此用户经常会面对切换工具所带来的挑战。更重要的是,SQL用户并不知道什么是他们业务场景下正确的工具。每种引擎都有不同特点(优点和缺点),适用于不同的任务。如果选择不恰当,查询性能和资源的使用效率都会受影响。U-SQL是由Uber工程师内部开发的工具,希望能作为未来的SQL平台,自动地解析、翻译、优化和路由用户用任何支持的查询语言写出的查询,并提供一个统一的SQL接口给那些可能根本不熟悉底层SQL引擎的SQL用户使用。

Uber relies heavily on big data to learn about its drivers and rides and make daily business decisions. Uber uses many SQL tools, including Apache Hive and Presto, as well as legacy data warehouse systems, such as Vertica. These tools offer a similar yet different SQL syntax. Users frequently face challenges when switching among different tools. More importantly, SQL users don’t know the right tools to use for their particular use cases. Each of the engines has different strengths and weaknesses and fits different workloads. If improperly selected, both query performance and resource efficiency can suffer.

Xuefu Zhang offers an overview of U-SQL, which was developed internally by engineers at Uber and is envisioned as the future of SQL platforms. U-SQL enables automatic parsing, translation, optimization, and routing for user queries written in any supported query language and provides a unified SQL interface for SQL users who might not be familiar with the underlying SQL engines.

Photo of Xuefu Zhang

Xuefu Zhang


Xuefu Zhang is a software engineer at Uber, where he is the tech team lead for SQL on Hadoop. A veteran of the open source community, Xuefu spends most of his time on Apache Hive and Pig. Previously, he was the tech lead for Hive at Cloudera and led a global effort for the Hive on Spark project, worked on the Hadoop team at Yahoo, and spent his early career at Informatica gaining important experience in enterprise software development, especially in ETL and data warehousing. Xuefu is an Apache member and a PMC member for Hive, Sentry, and Pig.



WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2


ORB Data Site