Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

One cluster does not fit all: Architecture patterns for multicluster Apache Kafka deployments

Gwen Shapira (Confluent)
2:40pm3:20pm Thursday, March 16, 2017
Secondary topics:  Architecture, Streaming
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Data architects, DevOps engineers, or anyone deploying Kafka and wondering how many clusters they really need

Prerequisite knowledge

  • Basic knowledge of Apache Kafka

What you'll learn

  • Explore Apache Kafka features for multitenant clusters
  • Learn how to run a single Kafka cluster in multiple data centers (and when this is a good idea)
  • Understand how to synchronize multiple clusters effectively for active-active, failover, and analytics use cases


In the last year, multicluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. The reasons are many and include:

  • Different groups in the same company using Kafka in different ways
  • Collecting information from many geographical regions and branches to a centralized analytics cluster
  • Planning for cases where an entire cluster or data center is not available
  • Using Kafka to assist in cloud migration

Gwen Shapira offers an overview of several use cases, including real-time analytics and payment processing, that may require multicluster solutions and discusses real-world examples with their specific requirements. Gwen outlines the pros and cons of several common architecture patterns, including:

  • Multitenant Kafka clusters
  • Active-active multiclusters
  • Failover clusters
  • Stretching a single cluster between multiple data centers
  • Using Kafka to bridge between clouds or between on-premises and the cloud

Along the way, Gwen explores the features of Apache Kafka and demonstrates how to use this understanding of Kafka to choose the right architecture for use cases from the financial, retail, and media industries.

Photo of Gwen Shapira

Gwen Shapira


Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Comments on this page are now closed.


Picture of Gwen Shapira
04/04/2017 5:44am PDT

Hi Jian,

This is discussed in some detail in the presentation. If you need more depth, chapter 8 of “Kafka The Definitive Guide” includes more information.

03/16/2017 9:25am PDT

Hi Gwen,

My company is leveraging Kafka to move data and we have multiple Kafka clusters and sometimes we need to fail over one cluster to another. We know that this is difficult because offsets aren’t preserved. I was told confluent has published how to do this properly. Can you please share the link?

Thanks in advance