Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Stream analytics with SQL on Apache Flink

Fabian Hueske (data Artisans)
14:0514:45 Wednesday, 24 May 2017
Stream processing and analytics
Location: Capital Suite 8/9
Level: Intermediate
Average rating: ***..
(3.50, 2 ratings)

Who is this presentation for?

  • Software engineers, data engineers, and IT architects

Prerequisite knowledge

  • Familiarity with stream processing and SQL basic concepts

What you'll learn

  • Understand Apache Flink’s relational APIs for streaming analytics and their conceptual model
  • Learn how to use the APIs to solve common stream analytics use cases

Description

SQL is undoubtedly the most widely used language for data analytics, and for good reason. It is declarative, and many SQL database systems and query processors feature advanced query optimizers and highly efficient execution engines. SQL has become the standard that everybody knows and uses.

With stream processing technology becoming mainstream, why isn’t SQL widely supported by open source stream processors? SQL’s semantics and syntax were not designed with the characteristics of streaming data in mind. Consequently, systems that want to provide support for SQL on data streams have to overcome a conceptual gap. One approach is to support standard SQL, which is well known but requires cumbersome workarounds for many common streaming computations. Other approaches are to design custom SQL-inspired stream analytics languages or to extend SQL with streaming-specific keywords. While such solutions tend to result in more intuitive syntax, they suffer from not being established standards and thereby exclude many users and tools.

Apache Flink is a distributed stream processing system with very good support for streaming analytics. Flink features two relational APIs, the Table API and SQL. The Table API is a language-integrated relational API with stream-specific features. Flink’s SQL interface implements the plain SQL standard. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite.

Fabian Hueske explores Apache Flink’s relational APIs for stream analytics, discussing their conceptual model and showcasing their usage. The central concept of these APIs is dynamic tables. Fabian explains how streams are converted into dynamic tables and vice versa without losing information due to the stream-table duality. Relational queries on dynamic tables behave similarly to materialized view definitions and produce new dynamic tables. Fabian demonstrates how dynamic tables are converted back into changelog streams or are written as materialized views to external systems, such as Apache Kafka or Apache Cassandra, and are updated in place with low latency. Fabian then highlights the power and expressiveness of Flink’s relational APIs by outlining common stream analytics use cases.

Photo of Fabian Hueske

Fabian Hueske

data Artisans

Fabian Hueske is a committer and PMC member of the Apache Flink project. He was one of the three original authors of the Stratosphere research system, from which Apache Flink was forked in 2014. Fabian is a cofounder of data Artisans, a Berlin-based startup devoted to fostering Flink, where he works as a software engineer and contributes to Apache Flink. He holds a PhD in computer science from TU Berlin and is currently spending a lot of his time writing a book, Stream Processing with Apache Flink.