Modern businesses have data at their core, and this data is changing continuously. Stream processing is what allows you harness this torrent of information in real time, and thousands of companies use Apache Kafka as the core platform for streaming data to transform and reshape their industries. However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies require the user to write code in programming languages such as Java or Scala. This hard requirement on coding skills is preventing many companies to unlock the benefits of stream processing to their full effect.
However, imagine that instead of having to write a lot of code in a programming language like Java or Scala for your favorite stream processing technology, all you’d need to get started with stream processing is a simple SQL statement, such as: SELECT * FROM payments-kafka-stream WHERE fraudProbability > 0.8.
Michael Noll offers an overview of KSQL, the open source streaming SQL engine for Apache Kafka, which makes it easy to get started with a wide range of real-time use cases, such as monitoring application behavior and infrastructure, detecting anomalies and fraudulent activities in data feeds, and real-time ETL. With KSQL, there’s no need to write any code in a programming language. KSQL brings together the worlds of streams and databases by allowing you to work with your data in a stream and in a table format. Built on top of Kafka’s Streams API, KSQL supports many powerful operations, including filtering, transformations, aggregations, joins, windowing, sessionization, and much more. It is open source (Apache 2.0 licensed), distributed, scalable, fault tolerant, and real time. You’ll learn how KSQL makes it easy to get started with a wide range of stream processing use cases and how to get up and running as you explore how it all works under the hood.
Michael Noll is the technologist of the office of the CTO at Confluent, the company founded by the creators of Apache Kafka. Previously, Michael was the technical lead of DNS operator Verisign’s big data platform, where he grew the Hadoop, Kafka, and Storm-based infrastructure from zero to petabyte-sized production clusters spanning multiple data centers—one of the largest big data infrastructures in Europe at the time. He’s a well-known tech blogger in the big data community. In his spare time, Michael serves as a technical reviewer for publishers such as Manning and is a frequent speaker at international conferences, including Strata, ApacheCon, and ACM SIGIR. Michael holds a PhD in computer science.
Comments on this page are now closed.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org