Real-time fraud detection with Kafka Streams
Who is this presentation for?Data engineers, data architects, developers
Walmart just launched a new subscription package where it provides free delivery for users who are enrolled with a monthly subscription, which can be misused. Since the fraud detection model runs on each transaction and comes with very tight SLAs, Walmart had to increase availability in its Kafka streams cluster and reduce latency.
Navinder Pal Singh Brar dives into this architecture along with problems Walmart faced, such as frequent rebalancing, high latency, invariably large assignment size, and excessive data movement after every deployment. You’ll explore smart client—a client library to query the precise machine and get data for a customer to reduce interactive queries; reading from replicas—during bulk ingestion of events, CPU usage is at the peak in the machines which leads to high latency, so Walmart had to support read from replicas to counter that; read while rebalancing—during rebalancing, streams cluster doesn’t serve queries, but since availability is of foremost importance, Walmart enabled reads during rebalancing; reading from a store of a specific partition—currently, Kafka streams iterate over each partition of a store via composite stores to find the key, Walmart enabled direct read into the partition where the key is present; and the backup and restore feature—developed a backup and restore feature for all the data stored in RocksDB if the state gets corrupted due to some bug or any other reasons.
- A basic understanding of Kafka and Kafka Streams
What you'll learn
- Discover lessons from using Kafka streams as data as a service
- Learn how to modify Kafka Streams to fit availability-first use cases
Navinder Pal Singh Brar
Navinder Pal Singh Brar is a senior software engineer at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a new platform on top of it to suit the company’s needs to process billions of events per day in real time and trigger models on each event. He’s been active in contributing back to Kafka Streams and has patented few features. He’s interested in solving complex problems and distributed systems. Navinder likes to spend time in the gym and boxing ring in his spare time.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires