Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Foundations of streaming SQL; or, How I learned to love stream and table theory

Tyler Akidau (Google)
11:00am11:40am Thursday, March 8, 2018
Average rating: *****
(5.00, 4 ratings)

Who is this presentation for?

  • Anyone interested in data processing

Prerequisite knowledge

  • Familiarity with the Beam model and stream and table theory

What you'll learn

  • Understand the key concepts underpinning data processing
  • Learn what robust stream processing in SQL looks like

Description

What does it mean to execute streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing? And how does all of this relate to the programmatic frameworks we’re all familiar with? Tyler Akidau answers these questions and more as he walks you through key concepts underpinning data processing in general.

Tyler begins by exploring the relationship between the Beam model (as described in his paper “The Dataflow Mode” and the “Streaming 101” and “Streaming 102” blog posts) and stream and table theory (as popularized by Martin Kleppmann and Jay Kreps, among others). It turns out that stream and table theory does an illuminating job of describing the low-level concepts that underlie the Beam model.

Tyler then explains what is required to provide robust stream processing support in SQL, discussing the concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, as well as new ideas yet to come. You’ll leave with a much better understanding of the key concepts underpinning data processing—regardless of whether that data processing is batch or streaming or SQL or programmatic—as well as a concrete notion of what robust stream processing in SQL looks like.

Photo of Tyler Akidau

Tyler Akidau

Google

Tyler Akidau is a senior staff software engineer at Google Seattle, where he leads technical infrastructure internal data processing teams for MillWheel and Flume. Tyler is a founding member of the Apache Beam PMC and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer that batch and streaming are two sides of the same coin and that the real endgame for data processing systems is the seamless merging between the two. He is the author of the 2015 “Dataflow Model” paper and “Streaming 101” and “Streaming 102” blog posts. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.

Comments on this page are now closed.

Comments

Foruhar Shiva |
03/08/2018 4:04am PST

Hey Tyler,

Thanks for the great talk. Just wanted to bring to your attention work that has been done in the area of temporal relational db (check out book by cj date). In a nutshell from a trdb perspective when you group (or otherwise combine) instantaneous events you end up with a data item that is no longer instantaneous but rather is associated with an interval reflecting the data items that contributed to it. In the interest of closure we can also think about instantaneous events as interval based with length 0. Cj Date outlines a new temporal relational algebra in the book. Thought you might find it interesting and potentially relevant.