Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Introducing Kafka Streams, Apache Kafka's new stream processing library

Neha Narkhede (Confluent)
14:05–14:45 Friday, 3/06/2016
Data innovations
Location: Capital Suite 12 Level: Intermediate
Tags: real-time, iot
Average rating: ****.
(4.12, 8 ratings)

Prerequisite knowledge

Attendees should have basic knowledge of programming as well as and understanding of Apache Kafka and the problems that it solves. Kafka beginners should be fine—understanding Kafka technically ("the how") is less important than understanding the kinds of problem it solves ("the what"). A familiarity with other stream processing tools such as Spark Streaming, Storm, or Flink will be beneficial but is not required.


In the past few years, Apache Kafka has established itself as the world’s most popular real-time, large-scale messaging system. Kafka has quickly become a mission-critical infrastructure component for modern data platforms and is used across a wide range of industries by thousands of companies, including Netflix, Cisco, PayPal, and Twitter.

The latest addition to the Apache Kafka project is Kafka Streams, a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such, it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. Neha Narkhede offers an overview of Kafka Streams, covering its design and API, typical use cases, code examples, and its upcoming roadmap. Neha also compares Kafka Streams’s lightweight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.

Photo of Neha Narkhede

Neha Narkhede


Neha Narkhede is the cofounder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Previously, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s petabyte-scale streaming infrastructure built on top of Apache Kafka and Apache Samza. Neha specializes in building and scaling large distributed systems and is one of the initial authors of Apache Kafka. A distributed systems engineer by training, Neha works with data scientists, analysts, and business professionals to move the needle on results.