Today’s enterprises producing data in high volume at high velocity. With velocity comes the need to process the data in real time. To meet these needs, Twitter developed and deployed Heron, a next-generation streaming engine that provides unparalleled performance at large scale. Over the three years it has been in production, Heron has been successfully meeting Twitter’s strict performance requirements for various streaming applications; it processes billions and billions of events per day at Twitter.
Heron is a open source project with several major contributors from various institutions. Twitter Heron was ported to high-performance computing (HPC) clusters with advanced processors, memory, I/O systems, and high-performance interconnects. High-performance interconnects such as Infiniband, Omnipath, and Cray XC networks feature submillisecond latencies and large bandwidths along with advanced messaging capabilities compared to ethernet at a comparative price. Large-scale distributed streaming applications can benefit from the low latencies and high bandwidths offered by these networks especially in financial and IoT industries.
Karthik Ramasamy and Supun Kamburugamuvee explain how they integrated Infiniband high-performance interconnect with Twitter Heron and optimized it for achieving low-latency, high-throughput stream processing. Experiments show that this system can achieve latencies as low as 7 ms and throughputs around 170M tuples/sec with minimal resources.
Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.
Supun Kamburugamuve is a graduate student at Indiana University and a senior software architect at the Digital Science Center of Indiana University, where he researches big data applications and frameworks. He’s working on high-performance enhancements to big data systems with HPC interconnect such as InfiniBand and Omni-Path. Supun is an elected member of Apache Software Foundation and has contributed to many open source projects including Apache Web Services projects. Previously, Supun worked on middleware systems and was a key member of a WSO2 enterprise service bus (ESB), an open source enterprise integration product widely used by enterprises. He has a PhD in computer science, specializing in high-performance data analytics at Indiana University.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com