Building a multitenant data processing and model inferencing platform with Kafka Streams
Who is this presentation for?
- Data engineers and data scientists
Navinder Pal Singh Brar provides an overview of the architecture for data processing and triggering models, which is inbuilt for scalability and reliability. As a multitenant platform, each client’s models (such as bid models, fraud detection, and omnichannel reorder) may be interested in certain events, such as search, add to cart, transactions, etc., and whenever such an event is processed, the model interested in that particular event is triggered.
Navinder details how an event lands into the system from Kafka, is processed and saved internally, and how the interested models are triggered on such events. Models use the internal persistent state (on RocksDB) for feature extraction and store their own model outputs in the platform, which can be used across teams as features. You’ll explore the architecture of the models, specifically ensuring fairness among the models, providing isolation and reusing features and inferences across models at the same time, dynamically updating global data (such as the product catalog) needed to run models on each node, customizing models to either trigger on each event or a as batch after frequent time intervals, implementing data archival and TTL policies and other features developed to save money, and advantages and limitations of the platform.
- A basic understanding of Kafka and Kafka Streams
- General knowledge of how distributed systems work (useful but not required)
What you'll learn
- Understand how to use Kafka Streams to run models on the processing of events
- Discover the lessons learned from productionizing a Kafka Streams cluster at scale and making it cost efficient
Navinder Pal Singh Brar
Navinder is a software engineer 3 at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a new platform on top of it to suit the company’s needs to process billions of events per day in real time and trigger models on each event. He’s been active in contributing back to Kafka Streams and has patented few features. He’s interested in solving complex problems and distributed systems. Navinder likes to spend time in gym and boxing ring in his spare time.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts