Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Streaming SQL to unify batch and stream processing: Theory and practice with Apache Flink at Uber

Fabian Hueske (data Artisans), Shuyi Chen (Uber)
5:10pm5:50pm Wednesday, March 7, 2018
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data engineers and IT architects

Prerequisite knowledge

  • A basic understanding of SQL and stream processing

What you'll learn

  • Learn how SQL can be interpreted in the streaming world, its semantics and applications, and practical benefits and challenges when implementing Stream SQL in a large company across teams

Description

SQL is the lingua franca for querying and processing data. To this day, it provides nonprogrammers with a powerful tool for analyzing and manipulating data. But with the emergence of stream processing as a core technology for data infrastructures, can you still use SQL and bring real-time data analysis to a broader audience?

The answer is yes, you can. SQL fits into the streaming world very well and forms an intuitive and powerful abstraction for streaming analytics. More importantly, you can use SQL as an abstraction to unify batch and streaming data processing. Viewing streams as dynamic tables, you can obtain consistent results from SQL evaluated over static tables and streams alike and use SQL to build materialized views as a data integration tool.

Fabian Hueske and Shuyi Chen explore SQL’s role in the world of streaming data and its implementation in Apache Flink and cover fundamental concepts, such as streaming semantics, event time, and incremental results. They also share their experience using Flink SQL in production at Uber, explaining how Uber leverages Flink SQL to solve its unique business challenges and how the unified stream and batch processing platform enables both technical or nontechnical users to process real-time and batch data reliably using the same SQL at Uber scale.

Photo of Fabian Hueske

Fabian Hueske

data Artisans

Fabian Hueske is a committer and PMC member of the Apache Flink project. He was one of the three original authors of the Stratosphere research system, from which Apache Flink was forked in 2014. Fabian is a cofounder of data Artisans, a Berlin-based startup devoted to fostering Flink, where he works as a software engineer and contributes to Apache Flink. He holds a PhD in computer science from TU Berlin and is currently spending a lot of his time writing a book, Stream Processing with Apache Flink.

Photo of Shuyi Chen

Shuyi Chen

Uber

Shuyi Chen is a senior software engineer at Uber working on building scalable real-time data solutions. He built Uber’s real-time complex event processing platform for the marketplace, which powers 100+ production real-time use cases. Currently, he is the tech lead of Uber’s stream processing platform team. Shuyi has years of experience in storage infrastructure, data infrastructure, and Android and iOS development at both Google and Uber.