Sep 23–26, 2019
Please log in

Serverless streaming architectures and algorithms for the enterprise

Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 09
Average rating: ****.
(4.33, 3 ratings)

Level

Intermediate

In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.

Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.

Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. You’ll get an overview of the inception and growth of the serverless paradigm. Arun, Karthik, and Anurag take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.

Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, they detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. They offer perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to streaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.

Materials or downloads needed in advance

NA

What you'll learn

  • Gain an in-depth overview of serverless for streaming and how to leverage different technologies and algorithms for a wide variety of use cases
Photo of Arun Kejariwal

Arun Kejariwal

Independent

Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Photo of Karthik Ramasamy

Karthik Ramasamy

Streamlio

Karthik Ramasamy is the cofounder of Streamlio, a company building next-generation real-time processing engines. Karthik has more than two decades of experience working in parallel databases, big data infrastructure, and networking. Previously, he was engineering manager and technical lead for real-time analytics at Twitter, where he was the cocreator of Heron; cofounded Locomatix, a company that specialized in real-time stream processing on Hadoop and Cassandra using SQL (acquired by Twitter); briefly worked on parallel query scheduling at Greenplum (acquired by EMC for more than $300M); and designed and delivered platforms, protocols, databases, and high-availability solutions for network routers at Juniper Networks. He is the author of several patents, publications, and one best-selling book, Network Routing: Algorithms, Protocols, and Architectures. Karthik holds a PhD in computer science from the University of Wisconsin-Madison with a focus on databases, where he worked extensively in parallel database systems, query processing, scale-out technologies, storage engines, and online analytical systems. Several of these research projects were spun out as a company later acquired by Teradata.

Photo of Anurag Khandelwal

Anurag Khandelwal

Yale University

Anurag Khandelwal is an assistant professor in the De­part­ment of Com­puter Sci­ence at Yale Uni­versity. Previously, Anurag did a short post-doc at Cor­nell Tech where he worked with Tom Risten­part and Rachit Agar­wal. He re­ceived his PhD from the Uni­versity of Cali­for­nia, Berke­ley, at the RI­SELab, where he was ad­vised by Ion Stoica. Anurag earned his BTech in com­puter sci­ence and en­gin­eer­ing from the In­dian In­sti­tute of Tech­no­logy, Khar­ag­pur. His research interests span distributed systems, networking, and algorithms. In particular, his research focuses on addressing core challenges in distributed systems through novel algorithm and data structure design. During his PhD, Anurag built large-scale data-intensive systems such as Succinct and Confluo, that led to deployments in several production clusters.

Comments on this page are now closed.

Comments

Picture of Anurag Khandelwal
Anurag Khandelwal | Assistant Professor
09/25/2019 4:09pm EDT

The slide-dec for the tutorial can be found here: https://www.slideshare.net/arunkejariwal/serverless-streaming-architectures-and-algorithms-for-the-enterprise-175954094

srija meka | Developer-2
09/24/2019 12:31pm EDT

Where can I find the presentations for training from today morning?

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires