Building a multitenant data processing and model inferencing platform with Kafka Streams

Navinder Pal Singh Brar (Walmart Labs)

11:20am–12:00pm Wednesday, September 25, 2019

Location: 1A 15/16

Data Engineering and Architecture

Secondary topics: Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT

Average rating:

(4.60, 5 ratings)

Download slides (PPTX)

Who is this presentation for?

Data engineers and data scientists

Level

Intermediate

Description

Navinder Pal Singh Brar provides an overview of the architecture for data processing and triggering models, which is inbuilt for scalability and reliability. As a multitenant platform, each client’s models (such as bid models, fraud detection, and omnichannel reorder) may be interested in certain events, such as search, add to cart, transactions, etc., and whenever such an event is processed, the model interested in that particular event is triggered.

Navinder details how an event lands into the system from Kafka, is processed and saved internally, and how the interested models are triggered on such events. Models use the internal persistent state (on RocksDB) for feature extraction and store their own model outputs in the platform, which can be used across teams as features. You’ll explore the architecture of the models, specifically ensuring fairness among the models, providing isolation and reusing features and inferences across models at the same time, dynamically updating global data (such as the product catalog) needed to run models on each node, customizing models to either trigger on each event or a as batch after frequent time intervals, implementing data archival and TTL policies and other features developed to save money, and advantages and limitations of the platform.

Prerequisite knowledge

A basic understanding of Kafka and Kafka Streams
General knowledge of how distributed systems work (useful but not required)

What you'll learn

Understand how to use Kafka Streams to run models on the processing of events
Discover the lessons learned from productionizing a Kafka Streams cluster at scale and making it cost efficient

Navinder Pal Singh Brar

Walmart Labs

Navinder Pal Singh Brar is a senior data engineer at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a Customer Data Platform on top of it to suit the company’s needs to process billions of customer events per day in real time and trigger certain machine learning models on each event. He’s been active in contributing back to Kafka Streams and filed three patents last year. Navinder is a regular speaker at local and international events on real-time stream processing, data platforms, and Kafka.