Now You See Me, Now You Compute: Building Event-driven Architectures with Apache Kafka
Who is this presentation for?Architects, data engineers, application developers, data analysts, data scientists
Would you cross the street with traffic information that is a minute old? Certainly not! Modern businesses have the same needs nowadays, whether it’s due to competitive pressure or because their customers have much higher expectations on how they want to interact with a product or service. At the heart of this movement are events: in today’s digital age, events are everywhere. Every digital action — across online purchases to ride-sharing requests to bank deposits — creates a set of events around transaction amount, transaction time, user location, account balance, and much more. The technology that allows businesses to read, write, store, and compute and process these events in real-time are event streaming platforms, and tens of thousands of companies like Netflix, Audi, PayPal, Airbnb, Uber, and Pinterest have picked Apache Kafka as the de-facto choice to implement event-driven architectures and reshape their industries.
In this talk we cover why and how you can use Apache Kafka and its growing ecosystem to build event-driven architectures that are elastic, scalable, robust, and fault-tolerant, whether that is on-prem, in the cloud, on bare metal machines, or in Kubernetes with Docker containers. Specifically, we look at (1) Kafka as the storage and publish/subscribe layer, (2) Kafka’s Connect framework for integrating external data systems such as MySQL, Elastic, or S3 with Kafka, and (3) Kafka’s Streams API and KSQL as the compute layer to implement event-driven applications and microservices in Java/Scala and streaming SQL, respectively, that process the events flowing through Kafka in real-time. I give an overview of the most relevant functionality, both current as well as upcoming, and share best practices and typical use cases so you can tie it all together for your own needs.
Prerequisite knowledgeThe audience should understand the importance of data for modern businesses (this should be a given at Strata). A basic understanding of Kafka would be helpful but is not required. Note that I will not cover Kafka basics like "What is a partition, what is a topic" because I will focus on the role and interplay of Kafka's most important components for storage, compute, etc. (example: when to use Kafka Streams vs. KSQL, when to use both)
What you'll learn
Michael Noll is the product manager for stream processing at Confluent, the company founded by the creators of Apache Kafka. His work is focused on Kafka’s Streams API and KSQL, the streaming SQL engine for Kafka. Previously Michael was the technical lead of the Big Data platform of .COM/.NET DNS operator Verisign, where he grew the Hadoop and Kafka based infrastructure from zero to petabyte-sized production clusters spanning multiple data centers – one of the largest Big Data infrastructures operated from Europe at the time. He is a contributor and committer to open source projects such as Apache Kafka and Apache Storm, and writes a well-known blog about Big Data and distributed systems at www.michael-noll.com. Michael has a Ph.D. in computer science and has been a frequent speaker at international conferences such as Kafka Summit, Strata, and ApacheCon.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts