San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Robin Moffatt (Confluent)

9:00–12:30 Tuesday, 30 April 2019

Data Engineering and Architecture
Location: Capital Suite 11

Secondary topics: Streaming and realtime analytics

Average rating:

(5.00, 5 ratings)

View slides

Who is this presentation for?

Data engineers and developers

Level

Intermediate

Prerequisite knowledge

Familiarity with databases and SQL

Materials or downloads needed in advance

A laptop (macOS or Linux with at least 8 GB of memory)
Complete the course setup instructions before the tutorial

What you'll learn

Explore best practices around building pipelines with Apache Kafka
Learn how to use just config and SQL to build complete ETL pipelines, patterns for integration with databases, and anti-patterns to be aware of

Description

Have you ever thought that you needed to be a programmer to do stream processing and build streaming data pipelines? Think again. Apache Kafka is a distributed, scalable, and fault-tolerant streaming platform that provides low-latency pub-sub messaging coupled with a native storage and stream processing capabilities. Integrating Kafka with RDBMS, NoSQL, and object stores is simple with Kafka Connect, part of Apache Kafka. KSQL—the open source SQL streaming engine for Apache Kafka—makes it possible to build stream processing applications at scale, written using a familiar SQL interface.

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You’ll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.

Gasp as you filter events in real time! Be amazed at how we can enrich streams of data with data from RDBMS! Be astonished at the power of streaming aggregates for anomaly detection!

Topics include:

Introduction to Apache Kafka (including Kafka Connect for streaming data from databases into Apache Kafka)
Streaming concepts (all data is events; stream/table duality)
Introduction to KSQL
How to run KSQL
Exploring kafka topics in KSQL
Defining KSQL streams and tables over source data
Filtering data in KSQL
Joining data in KSQL
Aggregating data in KSQL
Persisting stream queries
Examining derived Apache Kafka topics

Robin Moffatt

Confluent

Robin is a Developer Advocate at Confluent, the company founded by the creators of Apache Kafka, as well as an Oracle Developer Champion and ACE Director Alumnus. His career has always involved data, from the old worlds of COBOL and DB2, through the worlds of Oracle and Hadoop, and into the current world with Kafka. His particular interests are analytics, systems architecture, performance testing and optimization. He blogs at http://cnfl.io/rmoff and http://rmoff.net/ (and previously http://ritt.md/rmoff) and can be found tweeting grumpy geek thoughts as @rmoff. Outside of work he enjoys drinking good beer and eating fried breakfasts, although generally not at the same time.

Website

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com