Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Twitter Heron Goes Exactly Once

Karthik Ramasamy (Streamlio)

Who is this presentation for?

Data Engineers, Data Scientists, Technology Leaders

Prerequisite knowledge

A basic understanding of streaming processing and data processing semantics is helpful but not required.

What you'll learn

Attendees will come away with an overview of Heron and how it guarantees data to be processed exactly once.

Description

Twitter is all about real-time at scale. Twitter’s data centers continuously process billions of events per day at the instant the data is generated. To achieve real-time performance, Twitter has developed and deployed Heron, the next-generation streaming engine. Heron provides unparalleled performance at large-scale and has been successfully meeting price/performance goals for diverse streaming applications. It employs both at least once and at most once processing of data. Heron is now an open-source project and has contributors from various institutions.

In this talk Karthik will describe how Twitter and Streamlio collaborated to add exactly once processing to Heron. He will talk in detail about the algorithms and techniques employed to implement exactly once processing. Furthermore, he will share our experiences of running exactly once at scale – what type of applications it benefitted the most, where it is a overkill and what is the cost of running exactly once based streaming applications.

Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik is the engineering manager and technical lead for Real Time Analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure and networking. He cofounded Locomatix, a company that specializes in real timestreaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks where he designed and delivered platforms, protocols, databases and high availability solutions for network routers that are widely deployed in the Internet. Before joining Juniper at University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engine and online analytical systems. Several of these research were spun as a company later acquired by Teradata.

He is the author of several publications, patents and one of the best selling book “Network Routing: Algorithms, Protocols and Architectures.” He has a Ph.D. in Computer Science from UW Madison with a focus on databases.