Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Effectively once, exactly once, and more in Heron

Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio)
11:50am12:30pm Thursday, March 8, 2018
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Software engineers and engineering managers

Prerequisite knowledge

  • A basic understanding of streaming processing and data processing semantics (useful but not required)

What you'll learn

  • Explore Heron and learn how it guarantees data to be processed exactly once

Description

Stream processing systems must support a number of different types of processing semantics due to the diverse nature of streaming applications. To process real-time data at scale, Twitter has developed and deployed its next-generation streaming engine, Heron, which provides unparalleled performance at large scale and has been successfully meeting price and performance goals for diverse streaming applications. It employs both at-least-once and at-most-once processing of data. Heron is now an open source project and has contributors from various institutions.

Drawing on their experience with Heron, Karthik Ramasamy and Sanjeev Kulkarni explore effectively once, exactly once, and other types of stateful processing techniques, explain how they are implemented in Heron, and demonstrate how your applications will benefit from using them. They also share their experience running exactly once at scale: what type of applications it benefitted the most, where it’s overkill, and the cost of running exactly once-based streaming applications.

Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.

Photo of Sanjeev Kulkarni

Sanjeev Kulkarni

Streamlio

Sanjeev Kulkarni is the cofounder of Streamlio, a company focused on building a next-generation real-time stack. Previously, he was the technical lead for real-time analytics at Twitter, where he cocreated Twitter Heron; worked at Locomatix handling the company’s engineering stack; and led several initiatives for the AdSense team at Google. Sanjeev holds an MS in computer science from the University of Wisconsin-Madison.