Building a scalable cloud native stream processing system often requires taking on two systems: a complex distributed log system like Apache Kafka, AWS Kinesis, or Apache Pulsar and a complex event processing system like Apache Spark or Apache Flink. For small teams hoping to quickly build and operate a streaming pipeline, these systems may be too expensive and complicated to successfully deploy and maintain.
After two years of running streaming pipelines through Kinesis and Spark at One Click Retail, Jowanza Joseph and Karthik Ramasamy decided to explore a new platform that would take advantage of Kubernetes and support a simpler data processing DSL. Join in to discover why they chose Apache Pulsar (hint: its native support for Kubernetes and Pulsar Functions—a serverless functions model on top of Pulsar) and learn tips and tricks for using Pulsar Functions.
Apache Pulsar’s pure Java API allowed them to increase productivity while still handling most of the workloads they were previously handling in Spark. In addition, Pulsar Functions allowed them to tune the message delivery semantics per application workload, take advantage of the Java ecosystem, and plug into the wider Kubernetes ecosystem.
Jowanza Joseph is principal software engineer at One Click Retail. Jowanza’s work is focused on distributed stream processing and distributed data storage.
Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com