Stream processing systems must support a number of different types of processing semantics due to the diverse nature of streaming applications. To process real-time data at scale, Twitter has developed and deployed its next-generation streaming engine, Heron, which provides unparalleled performance at large scale and has been successfully meeting price and performance goals for diverse streaming applications. It employs both at-least-once and at-most-once processing of data. Heron is now an open source project and has contributors from various institutions.
Drawing on their experience with Heron, Karthik Ramasamy and Sanjeev Kulkarni explore effectively once, exactly once, and other types of stateful processing techniques, explain how they are implemented in Heron, and demonstrate how your applications will benefit from using them. They also share their experience running exactly once at scale: what type of applications it benefitted the most, where it’s overkill, and the cost of running exactly once-based streaming applications.
Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper Networks. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.
Sanjeev Kulkarni is the cofounder of Streamlio, a company focused on building a next-generation real-time stack. Previously, he was the technical lead for real-time analytics at Twitter, where he cocreated Twitter Heron; worked at Locomatix handling the company’s engineering stack; and led several initiatives for the AdSense team at Google. Sanjeev holds an MS in computer science from the University of Wisconsin-Madison.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com