When you’re storing petabytes of data in a large distributed system, moving data from machine to machine can be an arduous and expensive operation. The problem has two parts: working out when and where data should move and limiting the bandwidth used by data transfers. In a multitenant system, where each machine has different load profiles, this can be tricky. If you’re too restrictive, progress can be starved; if you’re too open, users will encounter problems.
Dynamic data rebalancing is a complex process. Ben Stopford and Ismael Juma explain how to do data rebalancing and use replication quotas in the latest version of Apache Kafka, discussing the algorithms added to the latest Kafka release for handling dynamic data distribution and throttling the data transfer between machines. The result: a multitenant streaming platform that can scale elastically in response to your very own usage profile.
Ben Stopford is an engineer and architect on the Apache Kafka core team at Confluent (the company behind Apache Kafka). A specialist in data, both from a technology and an organizational perspective, Ben previously spent five years leading data integration at a large investment bank, using a central streaming database. His earlier career spanned a variety of projects at Thoughtworks and UK-based enterprise companies. He writes at Benstopford.com.
Ismael Juma is a Kafka committer and engineer at Confluent, where he is building a stream data platform based on Apache Kafka. Earlier, he worked on automated data balancing. Previously, Ismael was the lead architect at Time Out, where he was responsible for the data platform at the core of Time Out’s international expansion and print to digital transition. Ismael has contributed to several open source projects, including Voldemort and Scala.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com