Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Apache Kafka: The rise of real-time data and stream processing

Neha Narkhede (Confluent)
1:15pm–1:55pm Wednesday, 09/28/2016
IoT & real-time
Location: 1 E 12/1 E 13 Level: Beginner
Tags: real-time
Average rating: ****.
(4.92, 12 ratings)

What you'll learn

  • Understand how application development and data will change due to the rise of stream processing
  • Description

    For a long time, a substantial portion of the data processing that companies did ran as big batch jobs—CSV files dumped out of databases, log files collected at the end of the day, etc. But businesses operate in real time, and the software they run is catching up. Rather than processing data only at the end of the day, why not react to it continuously as the data arrives? This is the emerging world of stream processing.

    But stream processing only becomes possible when the fundamental data capture is done in a streaming fashion; after all, you can’t process a daily batch of CSV dumps as a stream. This shift toward stream processing has driven the popularity of Apache Kafka. Making all an organization’s data available centrally as free-flowing streams enables business logic to be represented as stream processing operations. Essentially, applications are stream processors in this new world of stream processing.

    Neha Narkhede explains how Apache Kafka serves as a foundation to streaming data applications that consume and process real-time data streams and introduces Kafka Connect, a system for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library. Neha also describes the lessons companies like LinkedIn learned building massive streaming data architectures.

    Photo of Neha Narkhede

    Neha Narkhede

    Confluent

    Neha Narkhede is the cofounder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Previously, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s petabyte-scale streaming infrastructure built on top of Apache Kafka and Apache Samza. Neha specializes in building and scaling large distributed systems and is one of the initial authors of Apache Kafka. A distributed systems engineer by training, Neha works with data scientists, analysts, and business professionals to move the needle on results.