Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

High-performance messaging with Apache Pulsar

Karthik Ramasamy (Streamlio), Matteo Merli (Streamlio)
2:00pm–2:40pm Thursday, 09/13/2018
Emerging technologies & case studies
Location: 1E 07/08 Level: Beginner
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Engineers and engineering managers

Prerequisite knowledge

  • A basic understanding of streaming systems and storage systems (useful but not required)

What you'll learn

  • Understand messaging systems, Apache Pulsar, and OpenMessaging Benchmark
  • Learn what to look for in a messaging solution for your use case

Description

Apache Pulsar is a distributed pub/sub messaging system that originated at Yahoo, where it has been powering many critical systems and user-facing products for several years. Since Pulsar is designed to be used as a critical component for building reliable systems, it needs to offer strong guarantees in terms of the durability of the data, throughput, and latency. Apache Pulsar relies on a stream-storage layer built on top of Apache BookKeeper to create a powerful combination that can store and deliver data durably with very low latency. In order to achieve this, Apache Pulsar and BookKeeper introduced several optimizations, such as object pooling or reducing threads contention, to ensure you can fully take advantage of the hardware resources under diverse workloads.

Karthik Ramasamy and Matteo Merli dive into the details about these optimizations and how they enable Pulsar to perform well in OpenMessaging Benchmark, a project of the Linux Foundation that provides an extensible framework for benchmarking different messaging systems under a set of realistic workload conditions.

Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.

Photo of Matteo Merli

Matteo Merli

Streamlio

Matteo Merli is a software engineer at Streamlio working on messaging and storage technologies. Previously, he spent several years at Yahoo building database replication systems and multitenant messaging platforms. Matteo was the architect and lead developer for Yahoo Pulsar and a member of the PMC of Apache BookKeeper.