Sep 23–26, 2019

How Orange Financial combats financial fraud over 50M transactions a day using Apache Pulsar

Weisheng Xie (Orange Finance), Sijie Guo (Apache Software Foundation)
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 15/16

Who is this presentation for?

  • Data scientists, software engineers, and CTOs

Level

Beginner

Description

As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, Orange Financial has dozens of online financial products. The company faces threats from financial fraud every day, such as identity theft, money laundry, affiliate fraud, merchant fraud, etc. Risk control is vital, and the company has thousands of decisions running against each transaction to fight against these threats in its risk management system.

Weisheng Xie and Sijie Guo explore how Orange Financial leverages Apache Pulsar to boost the efficiency of its risk-control decision development.

In the risk-management scenario, the core is decision making. Decisions are composed of a series of rules and models. Needless to say, the development of rules and models is vital, but another part that’s equally important is the manufacturing of the indicators and features required by the decisions. Some indicators of Orange Financial’s risk-management system, for example, are the intimacy between users, the monthly average consumption frequency and money, the login frequency in the last minute and the last month and year, and the time interval between the last two transfer transactions, etc. Clearly, some of these indicators require large volume of historical data stored in a data store, Hive, for example, and are computed normally in batch mode (e.g., Presto in this case); some indicators depend on data in the current transaction and are needed by decisions of current transaction; the real-time transaction data is stored in a message queue such as Kafka, streaming computation is widely adopted (e.g., Spark Streaming). This is a typical Lambda architecture and has been running for many years at Orange Financial.

The biggest detraction to this architecture has been the need to maintain two distinct (and possibly complex) systems to generate both batch and speed layers. Kappa attempts to simplify by only keeping one code base rather than manage one for each batch and speed layers in the Lambda architecture. The complication of this architecture mostly revolves around having to process this data in a stream, such as handling duplicate events, cross-referencing events, or maintaining order—operations that are generally easier to do in batch processing. Still, the company has been seeking a solution that can unify the data store, computing engine, and programing language for decision development in its risk control system.

Apache Pulsar is an open source distributed event streaming system originally created at Yahoo and now part of the Apache Software Foundation. Apache Pulsar addresses the messy operational problems by storing data in segmented streams. The data is appended to topics (a.k.a., streams) as they arrive, and segmented and stored in a scalable log storage, Apache BookKeeper. As the data is stored as only one copy (source of truth), it addressed the inconsistency problem in Lambda architecture. Also the data can be accessed in Streams via unified pub/sub messaging and segments for elastic parallel batch processing. It makes Apache Pulsar a perfect unified messaging and storage solution. Together with a unified computing engine like Spark, it can boost the efficiency of Orange Financial’s risk-control decision deployment.

Prerequisite knowledge

  • A working knowledge of big data, data processing, and pub/sub messaging

What you'll learn

  • Understand Lambda architecture and Apache Pulsar
  • Discover how Orange Financial uses Lambda for risk-control decision deployment and how it boosts efficiency by leveraging Pulsar
Photo of Weisheng Xie

Weisheng Xie

Orange Finance

Vincent Xie (谢巍盛) is the chief scientist and director of Orange Finance, where he’s responsible for building the company’s Artificial Intelligence Group and leading the team to carry out research related to big data and AI. Previously, he worked for Intel, leading an engineering team working on machine learning- and big data-related open source technologies.

Photo of Sijie Guo

Sijie Guo

Apache Software Foundation

Sijie Guo is the PMC chair of Apache BookKeeper and the PMC member of Apache Pulsar at the Apache Software Foundation. Previously, he led the messaging team at Twitter and worked on push notification infrastructure at Yahoo.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts