Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Robin Moffatt (Confluent)
9:0012:30 Tuesday, 30 April 2019
Data Engineering and Architecture
Location: Capital Suite 11
Average rating: *****
(5.00, 5 ratings)

Who is this presentation for?

  • Data engineers and developers

Level

Intermediate

Prerequisite knowledge

  • Familiarity with databases and SQL

Materials or downloads needed in advance

What you'll learn

  • Explore best practices around building pipelines with Apache Kafka
  • Learn how to use just config and SQL to build complete ETL pipelines, patterns for integration with databases, and anti-patterns to be aware of

Description

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again. Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform that provides low-latency pub-sub messaging coupled with a native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, part of Apache Kafka. KSQL—the open source SQL streaming engine for Apache Kafka—makes it possible to build stream processing applications at scale, written using a familiar SQL interface.

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You’ll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.

Gasp as you filter events in real time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!

Topics include:

  • Introduction to Apache Kafka (including Kafka Connect for streaming data from databases into Apache Kafka)
  • Streaming concepts (all data is events; stream/table duality)
  • Introduction to KSQL
  • How to run KSQL
  • Exploring kafka topics in KSQL
  • Defining KSQL streams and tables over source data
  • Filtering data in KSQL
  • Joining data in KSQL
  • Aggregating data in KSQL
  • Persisting stream queries
  • Examining derived Apache Kafka topics
Photo of Robin Moffatt

Robin Moffatt

Confluent

Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Developer Champion and ACE Director Alumnus. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ (and previously http://ritt.md/rmoff) and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.