Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

A deep dive into Kafka controller

Jun Rao (Confluent)
1:10pm–1:50pm Thursday, 09/13/2018
Streaming systems & real-time applications
Location: 1E 07/08 Level: Intermediate
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Engineers and those in operations

Prerequisite knowledge

  • Basic knowledge of Apache Kafka

What you'll learn

  • Understand how the controller works in Apache Kafka

Description

The controller is the brain of Apache Kafka. A big part of what the controller does is to maintain the consistency of the replicas and determine which replica can be used to serve the clients, especially during individual broker failure.
Jun Rao outlines the main data flow in the controller—in particular, when a broker fails, how the controller automatically promotes another replica as the leader to serve the clients, and when a broker is started, how the controller resumes the replication pipeline in the restarted broker.

Jun then describes recent improvements to the controller that allow it to handle certain edge cases correctly and increase its performance, which allows for more partitions in a Kafka cluster.

Photo of Jun Rao

Jun Rao

Confluent

Jun Rao is the cofounder of Confluent, a company that provides a streaming data platform on top of Apache Kafka. Previously, Jun was a senior staff engineer at LinkedIn, where he led the development of Kafka, and a researcher at IBM’s Almaden research data center, where he conducted research on database and distributed systems. Jun is the PMC chair of Apache Kafka and a committer of Apache Cassandra.