Apache Pulsar is a distributed pub/sub messaging system that originated at Yahoo, where it has been powering many critical systems and user-facing products for several years. Since Pulsar is designed to be used as a critical component for building reliable systems, it needs to offer strong guarantees in terms of the durability of the data, throughput, and latency. Apache Pulsar relies on a stream-storage layer built on top of Apache BookKeeper to create a powerful combination that can store and deliver data durably with very low latency. In order to achieve this, Apache Pulsar and BookKeeper introduced several optimizations, such as object pooling or reducing threads contention, to ensure you can fully take advantage of the hardware resources under diverse workloads.
Karthik Ramasamy and Matteo Merli dive into the details about these optimizations and how they enable Pulsar to perform well in OpenMessaging Benchmark, a project of the Linux Foundation that provides an extensible framework for benchmarking different messaging systems under a set of realistic workload conditions.
Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.
Matteo Merli is a software engineer at Streamlio working on messaging and storage technologies. Previously, he spent several years at Yahoo building database replication systems and multitenant messaging platforms. Matteo was the architect and lead developer for Yahoo Pulsar and a member of the PMC of Apache BookKeeper.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com