Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Reducing stream processing complexity using Apache Pulsar Functions

Jowanza Joseph (Pluralsight), Karthik Ramasamy (Streamlio)
4:20pm5:00pm Wednesday, March 27, 2019
Secondary topics:  AI and Data technologies in the cloud, Data Integration and Data Pipelines, Retail and e-commerce
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data engineers, data architects, software engineers, and software architects

Level

Intermediate

Prerequisite knowledge

  • Familiarity with Apache Pulsar, Apache Kafka, or AWS Kinesis (useful but not required)

What you'll learn

  • Understand the advantages of using Apache Pulsar for streaming workloads, tips and tricks with using Pulsar Functions, and future development plans for Pulsar and Pulsar Functions

Description

Building a scalable cloud native stream processing system often requires taking on two systems: a complex distributed log system like Apache Kafka, AWS Kinesis, or Apache Pulsar and a complex event processing system like Apache Spark or Apache Flink. For small teams hoping to quickly build and operate a streaming pipeline, these systems may be too expensive and complicated to successfully deploy and maintain.

After two years of running streaming pipelines through Kinesis and Spark at One Click Retail, Jowanza Joseph and Karthik Ramasamy decided to explore a new platform that would take advantage of Kubernetes and support a simpler data processing DSL. Join in to discover why they chose Apache Pulsar (hint: its native support for Kubernetes and Pulsar Functions—a serverless functions model on top of Pulsar) and learn tips and tricks for using Pulsar Functions.

Apache Pulsar’s pure Java API allowed them to increase productivity while still handling most of the workloads they were previously handling in Spark. In addition, Pulsar Functions allowed them to tune the message delivery semantics per application workload, take advantage of the Java ecosystem, and plug into the wider Kubernetes ecosystem.

Photo of Jowanza Joseph

Jowanza Joseph

Pluralsight

Jowanza Joseph is principal software engineer at One Click Retail. Jowanza’s work is focused on distributed stream processing and distributed data storage.

Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper Networks. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.