Sep 23–26, 2019
Please log in

Building a multitenant data processing and model inferencing platform with Kafka Streams

Navinder Pal Singh Brar (Walmart Labs)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 15/16
Average rating: ****.
(4.60, 5 ratings)

Who is this presentation for?

  • Data engineers and data scientists

Level

Intermediate

Description

Navinder Pal Singh Brar provides an overview of the architecture for data processing and triggering models, which is inbuilt for scalability and reliability. As a multitenant platform, each client’s models (such as bid models, fraud detection, and omnichannel reorder) may be interested in certain events, such as search, add to cart, transactions, etc., and whenever such an event is processed, the model interested in that particular event is triggered.

Navinder details how an event lands into the system from Kafka, is processed and saved internally, and how the interested models are triggered on such events. Models use the internal persistent state (on RocksDB) for feature extraction and store their own model outputs in the platform, which can be used across teams as features. You’ll explore the architecture of the models, specifically ensuring fairness among the models, providing isolation and reusing features and inferences across models at the same time, dynamically updating global data (such as the product catalog) needed to run models on each node, customizing models to either trigger on each event or a as batch after frequent time intervals, implementing data archival and TTL policies and other features developed to save money, and advantages and limitations of the platform.

Prerequisite knowledge

  • A basic understanding of Kafka and Kafka Streams
  • General knowledge of how distributed systems work (useful but not required)

What you'll learn

  • Understand how to use Kafka Streams to run models on the processing of events
  • Discover the lessons learned from productionizing a Kafka Streams cluster at scale and making it cost efficient
Photo of Navinder Pal Singh Brar

Navinder Pal Singh Brar

Walmart Labs

Navinder Pal Singh Brar is a senior data engineer at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a Customer Data Platform on top of it to suit the company’s needs to process billions of customer events per day in real time and trigger certain machine learning models on each event. He’s been active in contributing back to Kafka Streams and filed three patents last year. Navinder is a regular speaker at local and international events on real-time stream processing, data platforms, and Kafka.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires