Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

A deep dive into Apache Kafka core internals

Jun Rao (Confluent)
2:55pm3:35pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1E 07/08 Level: Intermediate
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Developers

Prerequisite knowledge

  • Basic knowledge of Kafka

What you'll learn

  • Understand how Apache Kafka provides high-throughput and high-reliability guarantees


Over the last few years, streaming platform Apache Kafka has been used extensively for real-time data collecting, delivering, and processing—particularly in the enterprise. Companies like LinkedIn are now sending more than a trillion messages per day to Kafka. Many companies (e.g., financial institutions) are now storing mission-critical data in Kafka.

Jun Rao leads a deep dive into some of the key internals that help make Kafka popular and provide strong reliability guarantees. You’ll learn about the underlying design in Kafka that leads to such high throughput and how Kafka supports high reliability through its built-in replication mechanism. One common use case of Kafka is propagating updatable database records. Jun explains how a unique Kafka feature called compaction is designed to solve just this kind of problem more naturally.

Photo of Jun Rao

Jun Rao


Jun Rao is the cofounder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM’s Almaden research data center, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Apache Cassandra.