Every day, the amount of data we store increases dramatically, and users demand high availability and low latency from the machines where it is stored. Single machines are prone to failure and in many cases can’t cope with high traffic demands. Distributed systems have become an increasingly common solution for these problems. They offer fault tolerance and resilience while allowing users to interact with what appears to be a single machine.
Laura Hampton discusses the difficulties in replicating data across multiple machines, explains how the Raft algorithm, used in Kubernetes and Docker Swarm, provides reasonable guarantees, and shares proposed solutions to the consensus problem (and why they work). Laura begins by introducing the problem of consensus and exploring naive solutions and the ways they can fail. She then leads a deep dive into the Raft algorithm, covering leader election, log replication, log compression, writing to stable storage, and procedures for changing cluster membership. Along the way, Laura details how Raft is used in practice, including where to locate cluster members, as well as some interesting adaptations to the algorithm used in MongoDB replica sets.
Laura Hampton is a New York-based Python developer. She is working on Warehouse, the next-generation Python package repository.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org