Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Unlocking the world of stream processing with KSQL, the streaming SQL engine for Apache Kafka

Michael Noll (Confluent)
14:0514:45 Wednesday, 23 May 2018

Who is this presentation for?

Architects, VP of Eng, CTOs, data engineers, data scientists, application developers

Prerequisite knowledge

Attendees should have a bit of background knowledge about data/big data technologies such as Apache Kafka, databases, or Hadoop. Still, I think any Strata attendee should understand this talk because the main point of KSQL is to make complicated stuff (distributed stream processing!) simple, easy, and fun.

What you'll learn

The attendees will learn how, with KSQL, they can implement many of their current and future use cases in the real-time/streaming space in a much faster and easier way. Notably, will attendees learn that KSQL significantly lowers the barriers to entry to the world of stream processing, which means a much larger and more diverse group of people can leverage the benefits of stream processing in an organisation -- these people have a lot of skills, but hardcore coding in Java or Scala is probably not one of them. In other words, stream processing is no longer confined to the smaller, specialized group of full-time code developers. We will also show how many common use cases in practice can be easily addressed with KSQL. We also show a hands-on (brief) demo to make all this more tangible.

Description

Imagine you must make data-driven decisions in real time, whether that’s detecting anomalies and fraudulent activities in data feeds, monitoring application behavior and infrastructure, session-based analysis of user activities, or doing real-time ETL. But instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need would be a simple SQL statement such as SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.

Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.

In this talk, I introduce the audience to KSQL, the open source streaming SQL engine for Apache Kafka. KSQL provides an easy and completely interactive SQL interface for data processing on Kafka — no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault-tolerant, and real-time. You will learn how KSQL makes it easy to get started with a wide range of stream processing use cases such as those described at the beginning. We cover how to get up and running with KSQL and explore the under-the-hood details of how it all works.

Photo of Michael Noll

Michael Noll

Confluent

Michael Noll is a product manager at Confluent, the company founded by the creators of Apache Kafka. Previously, Michael was the technical lead of DNS operator Verisign’s big data platform, where he grew the Hadoop, Kafka, and Storm-based infrastructure from zero to petabyte-sized production clusters spanning multiple data centers—one of the largest big data infrastructures in Europe at the time. He is a well-known tech blogger in the big data community. In his spare time, Michael serves as a technical reviewer for publishers such as Manning and is a frequent speaker at international conferences, including Strata, ApacheCon, and ACM SIGIR. Michael holds a PhD in computer science.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)