Imagine you must make data-driven decisions in real time, whether that’s detecting anomalies and fraudulent activities in data feeds, monitoring application behavior and infrastructure, session-based analysis of user activities, or doing real-time ETL. But instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need would be a simple SQL statement such as SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.
Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.
In this talk, I introduce the audience to KSQL, the open source streaming SQL engine for Apache Kafka. KSQL provides an easy and completely interactive SQL interface for data processing on Kafka — no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault-tolerant, and real-time. You will learn how KSQL makes it easy to get started with a wide range of stream processing use cases such as those described at the beginning. We cover how to get up and running with KSQL and explore the under-the-hood details of how it all works.
Michael Noll is a product manager at Confluent, the company founded by the creators of Apache Kafka. Previously, Michael was the technical lead of DNS operator Verisign’s big data platform, where he grew the Hadoop, Kafka, and Storm-based infrastructure from zero to petabyte-sized production clusters spanning multiple data centers—one of the largest big data infrastructures in Europe at the time. He is a well-known tech blogger in the big data community. In his spare time, Michael serves as a technical reviewer for publishers such as Manning and is a frequent speaker at international conferences, including Strata, ApacheCon, and ACM SIGIR. Michael holds a PhD in computer science.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com