Building a multitenant data processing and model inferencing platform with Kafka Streams
Who is this presentation for?
- Data engineers and data scientists
Level
Description
Navinder Pal Singh Brar provides an overview of the architecture for data processing and triggering models, which is inbuilt for scalability and reliability. As a multitenant platform, each client’s models (such as bid models, fraud detection, and omnichannel reorder) may be interested in certain events, such as search, add to cart, transactions, etc., and whenever such an event is processed, the model interested in that particular event is triggered.
Navinder details how an event lands into the system from Kafka, is processed and saved internally, and how the interested models are triggered on such events. Models use the internal persistent state (on RocksDB) for feature extraction and store their own model outputs in the platform, which can be used across teams as features. You’ll explore the architecture of the models, specifically ensuring fairness among the models, providing isolation and reusing features and inferences across models at the same time, dynamically updating global data (such as the product catalog) needed to run models on each node, customizing models to either trigger on each event or a as batch after frequent time intervals, implementing data archival and TTL policies and other features developed to save money, and advantages and limitations of the platform.
Prerequisite knowledge
- A basic understanding of Kafka and Kafka Streams
- General knowledge of how distributed systems work (useful but not required)
What you'll learn
- Understand how to use Kafka Streams to run models on the processing of events
- Discover the lessons learned from productionizing a Kafka Streams cluster at scale and making it cost efficient
Navinder Pal Singh Brar
Walmart Labs
Navinder Pal Singh Brar is a senior data engineer at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a Customer Data Platform on top of it to suit the company’s needs to process billions of customer events per day in real time and trigger certain machine learning models on each event. He’s been active in contributing back to Kafka Streams and filed three patents last year. Navinder is a regular speaker at local and international events on real-time stream processing, data platforms, and Kafka.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires