Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Elastic streams: Dynamic data redistribution in Apache Kafka

Ben Stopford (Confluent), Ismael Juma (Confluent)
16:3517:15 Wednesday, 24 May 2017
Stream processing and analytics
Location: Capital Suite 8/9
Level: Intermediate

Who is this presentation for?

  • Data engineers working on streaming data platforms and Apache Kafka

Prerequisite knowledge

  • A basic understanding of Apache Kafka and streaming

What you'll learn

  • Understand how rebalancing can improve the elasticity of heavily stateful distributed systems and why data rebalancing and bandwidth quotas are difficult problems for distributed data systems
  • Explore the options available for tackling these problems

Description

When you’re storing petabytes of data in a large distributed system, moving data from machine to machine can be an arduous and expensive operation. The problem has two parts: working out when and where data should move and limiting the bandwidth used by data transfers. In a multitenant system, where each machine has different load profiles, this can be tricky. If you’re too restrictive, progress can be starved; if you’re too open, users will encounter problems.

Dynamic data rebalancing is a complex process. Ben Stopford and Ismael Juma explain how to do data rebalancing and use replication quotas in the latest version of Apache Kafka, discussing the algorithms added to the latest Kafka release for handling dynamic data distribution and throttling the data transfer between machines. The result: a multitenant streaming platform that can scale elastically in response to your very own usage profile.

Photo of Ben Stopford

Ben Stopford

Confluent

Ben Stopford is an engineer and architect on the Apache Kafka core team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.

Photo of Ismael Juma

Ismael Juma

Confluent

Ismael Juma is a Kafka committer and engineer at Confluent, where he is building a stream data platform based on Apache Kafka. Earlier, he worked on automated data balancing. Previously, Ismael was the lead architect at Time Out, where he was responsible for the data platform at the core of Time Out’s international expansion and print to digital transition. Ismael has contributed to several open source projects, including Voldemort and Scala.