Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Twitter's real-time stack: Processing billions of events with Heron and DistributedLog

Karthik Ramasamy (Twitter)
1:15pm–1:55pm Thursday, 09/29/2016
IoT & real-time
Location: 3D 12
Average rating: ***..
(3.20, 5 ratings)

What you'll learn

  • Explore the end-to-end real-time stack Twitter designed in order to analyze events in real time
  • Description

    Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Karthik Ramasamy offers an overview of the end-to-end real-time stack Twitter designed in order to meet this challenge, consisting of DistributedLog (the distributed and replicated messaging system) and Heron (the streaming system for real-time computation).

    DistributedLog—a replicated log service built on top of Apache BookKeeper that provides infinite, ordered, append-only streams that can be used for building robust real-time systems—is the foundation of Twitter’s publish-subscribe system. Heron is Twitter’s next-generation streaming system built from ground up to address its scalability and reliability needs. Both systems have been in production for nearly two years and are widely used at Twitter in a range of diverse applications, such as the search ingestion pipeline, ad analytics, image classification, and more.

    Karthik describes Heron and DistributedLog in detail, covering use cases and sharing the operating experiences and challenges of running large-scale real-time systems at scale.

    Photo of Karthik Ramasamy

    Karthik Ramasamy

    Twitter

    Karthik Ramasamy is the engineering manager and technical lead for real-time analytics at Twitter. Karthik is the cocreator of Heron and has more than two decades of experience working in parallel databases, big data infrastructure, and networking. He cofounded Locomatix, a company that specializes in real-time stream processing on Hadoop and Cassandra using SQL, which was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum, where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks, where he designed and delivered platforms, protocols, databases, and high-availability solutions for network routers that are widely deployed in the Internet. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik has a PhD in computer science from UW Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.