Across diverse segments in industry, there has been a shift in focus from big data to fast data, stemming, in part, from the deluge of high-velocity data streams as well as the need for instant data-driven insights, and there has been a proliferation of messaging and streaming frameworks that enterprises utilize to satisfy the needs of various applications.
Drawing on their experience operating streaming systems at Twitter scale, Karthik Ramasamy, Arun Kejariwal, and Ivan Kelly walk you through state-of-the-art streaming architectures, frameworks, and algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. They also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, they explore the interplay between storage and stream processing and speculate about future developments.
Until recently, Arun Kejariwal was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install and click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns. In addition, his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection. Previously, Arun worked at Twitter, where he developed and open sourced techniques for anomaly detection and breakout detection. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.
Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper Networks. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.
Ivan Kelly is a software engineer at Streamlio, a startup dedicated to providing a next-generation integrated real-time stream processing solution, based on Heron, Apache Pulsar (incubating), and Apache BookKeeper. Ivan has been active in Apache BookKeeper since its very early days as a project in Yahoo! Research Barcelona. Specializing in replicated logging and transaction processing, he is currently focused on Streamlio’s storage layer.
Comments on this page are now closed.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com