Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

From rivulets to rivers: Elastic stream processing in Heron

Bill Graham (Twitter), Avrilia Floratau (Microsoft), Ashvin Agrawal (Microsoft)
11:00am11:40am Thursday, March 16, 2017
Stream processing and analytics
Location: LL20 D Level: Intermediate
Secondary topics:  Data Platform, Media, Streaming

Who is this presentation for?

  • Data engineers, data scientists, and technology leaders

Prerequisite knowledge

  • A basic understanding of streaming analytics (useful but not required)

What you'll learn

  • Explore Heron and learn how its resource scheduling and management elastically adapts to changes in load.

Description

Twitter is all about real time at scale. Twitter’s data centers continuously process billions of events per day the instant the data is generated. To achieve real-time performance, Twitter has developed and deployed Heron, a next-generation cloud streaming engine that provides unparalleled performance at large-scale. Heron has been successfully meeting Twitter’s strict performance requirements for various streaming applications and is now an open source project with contributors from various institutions.

Bill Graham, Avrilia Floratau, and Ashvin Agrawal explore how Twitter and Microsoft have collaborated to transform Heron into a truly elastic system. The amount of data that needs to be processed in Twitter’s data centers changes significantly due to expected and unexpected global events. For example, during the Super Bowl, there are spikes of tweets that all need to be processed in real time. Similarly, unexpected events such as natural disasters can generate very large volumes of data. Twitter’s infrastructure must be robust enough to deliver real-time performance when such events take place. Heron solves this problem by supporting dynamic scaling of Heron applications, achieved by increasing or decreasing the number of Heron instances that form a topology.

Providing truly elastic scaling in the context of Heron has several challenges. First, during scaling, the processing of events should not be disrupted. Heron must be able to handle running topologies while causing minimal disruption during scaling. Second, Heron has a modular and extensible architecture that allows it to integrate with various schedulers and incorporate various resource management policies. As a result, Heron’s scaling framework must be generic enough to operate efficiently on top of diverse components. Finally, Heron must be able to efficiently allocate resources during scaling so that it makes optimal use of available cluster resources. Bill, Avrilia, and Ashvin explain how Heron addresses the challenges discussed above to gracefully handle highly varying loads without sacrificing real-time performance and conclude by sharing future plans for scaling that current and future Heron contributors could tackle.

Photo of Bill Graham

Bill Graham

Twitter

Bill Graham is a staff engineer on the Real Time Compute team at Twitter. Bill’s primary areas of focus are data processing applications and analytics infrastructure. Previously, he was a principal engineer at CBS Interactive and CNET Networks, where he worked on ad targeting and content publishing infrastructure, and a senior engineer at Logitech focusing on webcam streaming and messaging applications. Bill contributes to a number of open source projects, including HBase, Hive, and Presto, and he’s a Heron and Pig committer.

Photo of Avrilia Floratau

Avrilia Floratau

Microsoft

Avrilia Floratau is a senior scientist at Microsoft’s Cloud and Information Services Lab, where her research is focused on scalable real-time stream processing systems. She is also an active contributor to Heron, collaborating with Twitter. Previously, Avrilia was a research scientist at IBM Research working on SQL-on-Hadoop systems. She holds a PhD in data management from the University of Wisconsin-Madison.

Photo of Ashvin Agrawal

Ashvin Agrawal

Microsoft

Ashvin Agrawal is a senior research engineer at Microsoft, where he works on streaming systems and contributes to the Twitter Heron project. Ashvin is a software engineer with more than 10+ years experience. He specializes in developing large-scale distributed systems. Previously, he worked at VMware, Yahoo, and Mojo Networks. Ashvin holds an MTech in computer science from IIT Kanpur, India.