Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Modern real-time streaming architectures

Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Sijie Guo (Streamlio), Arun Kejariwal (MZ)
9:00am12:30pm Tuesday, March 6, 2018
Secondary topics:  Graphs and Time-series
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Software engineers and engineering managers

Prerequisite knowledge

  • A basic understanding of streaming systems, messaging systems, and storage systems (useful but not required)

Materials or downloads needed in advance

What you'll learn

  • Understand stream processing fundamental concepts
  • Explore the different types of streaming architectures along with their pros and cons

Description

Across diverse segments in industry, there has been a shift in focus from big data to fast data, stemming, in part, from the deluge of high-velocity data streams as well as the need for instant data-driven insights, and there has been a proliferation of messaging and streaming frameworks that enterprises utilize to satisfy the needs of various applications.

Drawing on their experience operating streaming systems at Twitter scale, Karthik Ramasamy, Sanjeev Kulkarni, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming architectures, streaming frameworks, and streaming algorithms, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. They also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, they explore the interplay between storage and stream processing and speculate about future developments.

Topics include:

  • Basic requirements of stream processing
  • Streaming and one-pass algorithms
  • Different types of streaming architectures
  • An in-depth review of streaming frameworks
  • Deploying and operating stream processing applications
  • Lessons learned from building a real-time stack using Apache DistributedLog and Heron at Twitter’s scale
Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper Networks. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin-Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.

Photo of Sanjeev Kulkarni

Sanjeev Kulkarni

Streamlio

Sanjeev Kulkarni is the cofounder of Streamlio, a company focused on building a next-generation real-time stack. Previously, he was the technical lead for real-time analytics at Twitter, where he cocreated Twitter Heron; worked at Locomatix handling the company’s engineering stack; and led several initiatives for the AdSense team at Google. Sanjeev holds an MS in computer science from the University of Wisconsin-Madison.

Photo of Sijie Guo

Sijie Guo

Streamlio

Sijie Guo is the cofounder of Streamlio, a company focused on building a next-generation real-time data stack. Previously, he was the tech lead for the Messaging Group at Twitter, where he cocreated Apache DistributedLog, and worked on push notification infrastructure at Yahoo. He is the PMC chair of Apache BookKeeper.

Photo of Arun Kejariwal

Arun Kejariwal

MZ

Arun Kejariwal is a statistical learning principal at Machine Zone (MZ), where he leads a team of top-tier researchers and works on research and development of novel techniques for install and click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns. In addition, his team is building novel methods for bot detection, intrusion detection, and real-time anomaly detection. Previously, Arun worked at Twitter, where he developed and open-sourced techniques for anomaly detection and breakout detection. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high-performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Comments on this page are now closed.

Comments

Picture of Karthik Ramasamy
Karthik Ramasamy | COFOUNDER
03/06/2018 2:49pm PST

@wilson @esteban – the slides for the tutorial are available online at

https://www.slideshare.net/KarthikRamasamy3/tutorial-modern-real-time-streaming-architectures

Wilson Vivek Irudayam | PRINCIPAL SOFTWARE ENGINEER
03/06/2018 6:01am PST

The tutorial was great. Will the presentations be available for download?

Picture of Esteban Hernandez
Esteban Hernandez | SOFTWARE ARCHITECT
03/06/2018 1:22am PST

the slides will be available for download ?

Picture of Karthik Ramasamy
Karthik Ramasamy | COFOUNDER
03/05/2018 2:58am PST

Esteban – Our tutorial is tomorrow at March 6th at 9.00 am

Picture of Esteban Hernandez
Esteban Hernandez | SOFTWARE ARCHITECT
03/05/2018 12:56am PST

The conference start at 9:00 ? I’m here and don’t see anything

Picture of Karthik Ramasamy
Karthik Ramasamy | COFOUNDER
03/05/2018 12:50am PST

@lina – sure will do.

Picture of Karthik Ramasamy
Karthik Ramasamy | COFOUNDER
03/05/2018 12:15am PST

@jay – there is no need to install the software. This tutorial is more conceptual.

Jay Purkayastha | ARCHITECT
03/04/2018 12:59am PST

Hi,
I’m trying install the software mentioned as pre-requisite. Is there any installation guide available. I’m using Windows laptop and installed GitHub Desktop and downloaded Heron and Pulsar jars. But not able to find Windows binary for BookKeeper.

Any help will be appreciated.

Best Regards,
Jay Purkayastha

Lina Li | INFORMATION ARCHITECT
03/02/2018 3:00am PST

Insight of Pros and Cons for the tools mentioned here vs Kafka would be very helpful. Thanks, Lina