Modern enterprises produce data at increasingly high volume and velocity. To process data in real time, new types of storage systems have been designed, implemented, and deployed. Apache DistributedLog is a replicated log store originally developed at Twitter. It’s been used in production at Twitter for more than four years, supporting several critical services like pub/sub messaging, log replication for distributed databases, and real-time stream computing, delivering more than 1.5 trillion events (or about 17 PB) per day. Pulsar is a distributed pub/sub messaging platform that provides a flexible messaging model. Pulsar was developed at Yahoo and has been used in Yahoo Cloud Messaging Service to deliver several billions of messages per day.
Both built on Apache BookKeeper, Apache DistributedLog and Pulsar are similar in design and implementation but have different goals. Matteo Merli and Sijie Guo offer an overview of both systems and share advice on how to better use them.
Matteo Merli is a software engineer at Streamlio, where he works on messaging and storage technologies. Previously, he spent several years building database replication systems and multitenant messaging platforms at Yahoo. Matteo was the architect and lead developer for Pulsar and is a PMC member of Apache BookKeeper.
Sijie Guo is the cofounder of Streamlio, a company focused on building a next-generation real-time data stack. Previously, he was the tech lead for the Messaging Group at Twitter, where he cocreated Apache DistributedLog, and worked on push notification infrastructure at Yahoo. He is the PMC chair of Apache BookKeeper.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org