Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

When one data center is not enough: Building large-scale stream infrastructure across multiple data centers with Apache Kafka

Guozhang Wang (Confluent)
11:50am–12:30pm Thursday, 03/31/2016
Data Innovations

Location: 210 C/G
Tags: real-time
Average rating: ***..
(3.80, 5 ratings)

Prerequisite knowledge

Attendees should have a high-level understanding of data systems.


To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? Guozhang Wang describes building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence. Guozhang provides an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.

Photo of Guozhang Wang

Guozhang Wang


Guozhang is a an engineer at Confluent, building a stream data platform on top of Apache Kafka. Prior to Confluent, Guozhang was a senior software engineer at LinkedIn, developing and maintaining its backbone streaming infrastructure on Apache Kafka and Apache Samza. He holds a PhD from Cornell University’s database group, where he worked on scaling iterative data-driven applications.