Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka

Ewen Cheslack-Postava (Confluent)
1:15pm–1:55pm Thursday, 09/29/2016
IoT & real-time
Location: 1 E 12/1 E 13 Level: Intermediate
Tags: real-time
Average rating: ***..
(3.33, 3 ratings)

Prerequisite knowledge

  • A basic understanding of Kafka and related terminology
  • What you'll learn

  • Understand the challenges involved in managing multiple Kafka clusters
  • Learn proven architecture patterns to address common requirements
  • Discover what's currently possible, what's challenging, and what will have to wait until Kafka evolves a bit more
  • Description

    To manage the ever-increasing volume and velocity of data within your company, you may have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center powered by Apache Kafka. But what’s to be done if one data center is not enough?

    Ewen Cheslack-Postava explores resilient multi-data-center architecture with Apache Kafka, sharing best practices for data replication and mirroring as well as disaster scenarios and failure handling. Ewen covers four scenarios—replication and failover for disaster recovery, data produced in one location but consumed in another, aggregate cluster for data analysis, and bidirection relication—discussing the requirements for each, providing a proven architecture, and explaining the benefits and limitations of the solution.

    Photo of Ewen Cheslack-Postava

    Ewen Cheslack-Postava


    Ewen Cheslack-Postava is an engineer at Confluent building a stream data platform based on Apache Kafka to help organizations reliably and robustly capture and leverage all their real-time data. Ewen received his PhD from Stanford University, where he developed Sirikata, an open source system for massive virtual environments. His dissertation defined a novel type of spatial query giving significantly improved visual fidelity and described a system for efficiently processing these queries at scale.