Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Effectively once in Apache Pulsar, the next-generation messaging system

Matteo Merli (Streamlio)
4:20pm5:00pm Thursday, March 8, 2018
Average rating: *....
(1.00, 1 rating)

Who is this presentation for?

  • Software engineers and engineering managers

Prerequisite knowledge

  • Familiarity with messaging systems, stream processing, and real-time processing

What you'll learn

  • Explore the new techniques in Apache Pulsar that make it possible to achieve exactly once processing semantics

Description

Apache Pulsar is a distributed pub/sub messaging system that originated at Yahoo, where it has been powering critical systems and user-facing products for several years. The traditional API for Pulsar is derived from basic pub/sub concepts, such as subscriptions and consumers that receive messages and acknowledge their processing. This model is very simple yet powerful in that it allows you to build applications without needing to understand the underlying intricacies of the messaging system. The only drawback is that previous pub/sub system offer only at-least-once semantics, leaving the task of eliminating duplicated messages to the application.

Since the emergence of stream processing and more demanding requirements, messaging systems need to offer correct primitives to allow implementing effectively once semantics end to end, in both the messaging layer and the processing layer. In this context, “effectively once” means that messages can actually be replayed multiple times in the presence of failure, though the effects of their processing will be equivalent to exactly once.

Matteo Merli explores the new APIs introduced in Pulsar to offer effectively once semantics, discusses the implementation details and performance testing results, and shares use cases that can greatly benefit from this new feature.

Photo of Matteo Merli

Matteo Merli

Streamlio

Matteo Merli is a software engineer at Streamlio working on messaging and storage technologies. Previously, he spent several years at Yahoo building database replication systems and multitenant messaging platforms. Matteo was the architect and lead developer for Yahoo Pulsar and a member of the PMC of Apache BookKeeper.