Mar 15–18, 2020

Real-time fraud detection with Kafka Streams

Navinder Pal Singh Brar (Walmart Labs)
11:00am11:40am Tuesday, March 17, 2020
Location: 230 A

Who is this presentation for?

Data engineers, data architects, developers

Level

Advanced

Description

Walmart just launched a new subscription package where it provides free delivery for users who are enrolled with a monthly subscription, which can be misused. Since the fraud detection model runs on each transaction and comes with very tight SLAs, Walmart had to increase availability in its Kafka streams cluster and reduce latency.

Navinder Pal Singh Brar dives into this architecture along with problems Walmart faced, such as frequent rebalancing, high latency, invariably large assignment size, and excessive data movement after every deployment. You’ll explore smart client—a client library to query the precise machine and get data for a customer to reduce interactive queries; reading from replicas—during bulk ingestion of events, CPU usage is at the peak in the machines which leads to high latency, so Walmart had to support read from replicas to counter that; read while rebalancing—during rebalancing, streams cluster doesn’t serve queries, but since availability is of foremost importance, Walmart enabled reads during rebalancing; reading from a store of a specific partition—currently, Kafka streams iterate over each partition of a store via composite stores to find the key, Walmart enabled direct read into the partition where the key is present; and the backup and restore feature—developed a backup and restore feature for all the data stored in RocksDB if the state gets corrupted due to some bug or any other reasons.

Prerequisite knowledge

  • A basic understanding of Kafka and Kafka Streams

What you'll learn

  • Discover lessons from using Kafka streams as data as a service
  • Learn how to modify Kafka Streams to fit availability-first use cases
Photo of Navinder Pal Singh Brar

Navinder Pal Singh Brar

Walmart Labs

Navinder Pal Singh Brar is a senior software engineer at Walmart Labs, where he’s been working with the Kafka ecosystem for the last couple of years, especially Kafka Streams, and created a new platform on top of it to suit the company’s needs to process billions of events per day in real time and trigger models on each event. He’s been active in contributing back to Kafka Streams and has patented few features. He’s interested in solving complex problems and distributed systems. Navinder likes to spend time in the gym and boxing ring in his spare time.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires