Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog

Matteo Merli (Streamlio), Sijie Guo (StreamNative)
1:15pm1:55pm Thursday, September 28, 2017
Data Engineering & Architecture, Real-time applications
Location: 1E 09 Level: Beginner
Secondary topics:  Architecture, Streaming
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Software engineers, engineering managers, CIOs, and technology leaders

Prerequisite knowledge

  • Familiarity with distributed systems and pub/sub messaging concepts

What you'll learn

  • Explore Apache DistributedLog and Pulsar, real-time storage systems built using Apache BookKeeper


Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Apache DistributedLog is a replicated log store originally developed at Twitter. It’s been used in production at Twitter for more than four years, supporting several critical services like pub/sub messaging, log replication for distributed databases, and real-time stream computing, delivering more than 1.5 trillion events (or about 17 PB) per day. Pulsar is a distributed pub/sub messaging platform that provides a flexible messaging model. Pulsar was developed at Yahoo and has been used in Yahoo Cloud Messaging Service to deliver several billions of messages per day.

Both built on Apache BookKeeper, Apache DistributedLog and Pulsar are similar in design and implementation but have different goals. Matteo Merli and Sijie Guo offer an overview of both systems and share advice on how to better use them.

Photo of Matteo Merli

Matteo Merli


Matteo Merli is a software engineer at Streamlio, where he works on messaging and storage technologies. Previously, he spent several years building database replication systems and multitenant messaging platforms at Yahoo. Matteo was the architect and lead developer for Pulsar and is a PMC member of Apache BookKeeper.

Photo of Sijie Guo

Sijie Guo


Sijie Guo is the founder and CEO of StreamNative, a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. He’s also the vice president of Apache BookKeeper and a PMC member of Apache Pulsar. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo.