Apache Pulsar is a distributed pub/sub messaging system that originated at Yahoo, where it has been powering critical systems and user-facing products for several years. The traditional API for Pulsar is derived from basic pub/sub concepts, such as subscriptions and consumers that receive messages and acknowledge their processing. This model is very simple yet powerful in that it allows you to build applications without needing to understand the underlying intricacies of the messaging system. The only drawback is that previous pub/sub system offer only at-least-once semantics, leaving the task of eliminating duplicated messages to the application.
Since the emergence of stream processing and more demanding requirements, messaging systems need to offer correct primitives to allow implementing effectively once semantics end to end, in both the messaging layer and the processing layer. In this context, “effectively once” means that messages can actually be replayed multiple times in the presence of failure, though the effects of their processing will be equivalent to exactly once.
Matteo Merli explores the new APIs introduced in Pulsar to offer effectively once semantics, discusses the implementation details and performance testing results, and shares use cases that can greatly benefit from this new feature.
Matteo Merli is a software engineer at Streamlio working on messaging and storage technologies. Previously, he spent several years at Yahoo building database replication systems and multitenant messaging platforms. Matteo was the architect and lead developer for Yahoo Pulsar and a member of the PMC of Apache BookKeeper.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com