Sep 23–26, 2019

Building a multi-tenant data processing and model inferencing platform with Kafka Streams

Navinder Pal Singh Brar (Walmart Labs)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 15/16
Secondary topics:  Data Integration and Data Processing, Data, Analytics, and AI Architecture, Retail and e-commerce, Streaming and IoT

Who is this presentation for?

Data engineers, Data Scientists

Level

Intermediate

Description

In this talk, I would share an overview of the architecture for data processing and triggering models, which is inbuilt for scalability and reliability. Since ours is a multitenant platform, each of our client’s models(such as bid models, fraud detection and omnichannel reorder) could be interested in a certain type of events such as search, add to cart, transactions etc and whenever such an event is processed we trigger the model interested in that particular event. I would discuss in detail how event lands into our system from Kafka, then processed and saved internally and how the interested models are triggered on such events. Models use the internal persistent state(on RocksDB) for feature extraction and also store their own model outputs in the platform which could be used across teams as features.

The talk would focus on following parts of the architecture.

  • Ensuring fairness among the models.
  • Providing isolation and reusing features/inferences across models at the same time.
  • Dynamically updating global data(such as product catalog) needed to run models on each node.
  • Customizing models to either trigger them on each event or as batch after frequent time intervals.
  • Implementing data archival/TTL policies and other features developed to save money.
  • Advantages and limitations of the platform.

Prerequisite knowledge

A basic knowledge of Kafka and Kafka Streams. An understanding of how distributed systems work would be a plus.

What you'll learn

* Using Kafka Streams to run models on the processing of events. * Lessons learned from productionizing a Kafka Streams cluster at scale and making it cost-efficient.
Photo of Navinder Pal Singh Brar

Navinder Pal Singh Brar

Walmart Labs

Navinder is a data engineer in Walmart Labs where he has been working on Kafka and Kafka Streams for over a year now. He likes working on distributed systems and lives in Bangalore, India. He has prior experience in building web applications and one of the biggest GDS platform used in the travel industry. He has a Bachelors degree in Computer Science.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts