Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Modern real-time streaming architectures

Karthik Ramasamy (Streamlio), Sanjeev Kulkarni (Streamlio), Arun Kejariwal (Independent), Neng Lu (Twitter), Sijie Guo (StreamNative)
1:30pm5:00pm Tuesday, September 26, 2017
Secondary topics:  Architecture, Streaming
Average rating: ***..
(3.00, 3 ratings)

Who is this presentation for?

  • Data engineers, data scientists, and technology leaders

Materials or downloads needed in advance

  • A laptop

What you'll learn

  • Understand stream processing fundamental concepts

Description

Across diverse segments in industry, there has been a shift in focus from big data to fast data, stemming, in part, from the deluge of high-velocity data streams as well as the need for instant data-driven insights.

Karthik Ramasamy, Sanjeev Kulkarni, Avrilia Floratau, Ashvin Agrawal, Arun Kejariwal, and Sijie Guo walk you through state-of-the-art streaming systems, algorithms, and deployment architectures, covering the typical challenges in modern real-time big data platforms and offering insights on how to address them. They also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, they explore the interplay between storage and stream processing and speculate about future developments.

Topics include:

  • Introduction to streaming
  • Basic requirements of stream processing
  • Streaming and one-pass algorithms
  • Different types of streaming architectures
  • An in-depth review of streaming frameworks
  • Deploying and operating stream processing applications
  • Lessons learned from building a real-time stack using Apache DistributedLog and Heron at Twitter’s scale
Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); worked briefly on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper. He’s the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin–Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.

Photo of Sanjeev Kulkarni

Sanjeev Kulkarni

Streamlio

Sanjeev Kulkarni is the cofounder of Streamlio, a company focused on building a next-generation real-time stack. Previously, he was the technical lead for real-time analytics at Twitter, where he cocreated Twitter Heron; worked at Locomatix handling the company’s engineering stack; and led several initiatives for the AdSense team at Google. Sanjeev holds an MS in computer science from the University of Wisconsin-Madison.

Photo of Arun Kejariwal

Arun Kejariwal

Independent

Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Photo of Neng Lu

Neng Lu

Twitter

Neng Lu is a software engineer from Twitter. He is currently the core committer to the Heron project and the leading engineer for Heron development at Twitter. He also worked on Twitter’s monitoring and key-value storage systems. Before joining Twitter, he got his master degree from UCLA and bachelor degree from Zhejiang Univeisity.

Photo of Sijie Guo

Sijie Guo

StreamNative

Sijie Guo is the founder and CEO of StreamNative, a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo. He’s also the VP of Apache BookKeeper and PMC Member of Apache Pulsar.

Comments on this page are now closed.

Comments

Picture of Karthik Ramasamy
Karthik Ramasamy | COFOUNDER AND CEO
09/27/2017 11:30pm EDT

Slides of the session are available at

https://www.slideshare.net/arunkejariwal/modern-realtime-streaming-architectures?qid=f9d8532f-94be-47ee-bdf1-045826ba283e&v=&b=&from_search=1
Jignesh Rawal |
09/27/2017 7:54pm EDT

Can anyone please share the link to the tutorial slides?