In large distributed systems, knowing the state of the whole system is a difficult task that becomes harder as you increment the number of nodes. There are too many nodes to communicate with, and many algorithms that solve the problem tend to grow linearly with the number of nodes. Since the underlying network is a problem, you can’t rely on hardware solutions, such as multicast, as they wouldn’t be available in the cloud. In addition, maintaining an updated graph of nodes—or even storing the graph itself—is a complex undertaking in large systems.
Many distributed systems now rely on gossip protocols—a way of multicasting messages, inspired by epidemics, human gossip, and social networks—to share the state of the system among the nodes, because they avoid these problems. Félix López Luis offers an introduction to gossip protocols, using a simulator to demonstrate how they behave when there are challenges like network partitions and faulty nodes.
Félix López Luis is an engineering manager at Google interested in distributed systems and machine learning. Over his career, he has worked on web development, video games, distributed systems, and applications for the currency exchange market. He holds a master’s degree in intelligent systems, including neural networks, speech processing, and data mining.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org