Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Case study: A Spark-based distributed simulation optimization architecture for portfolio optimization in retail banking

Kaushik Deka (Novantas), Ted Gibson (Novantas)
1:10pm–1:50pm Thursday, 09/13/2018
Data engineering and architecture
Location: 1A 23/24 Level: Intermediate
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • CTOs, VPs, product managers, directors of engineering, enterprise architects, machine learning engineers, and data scientists

Prerequisite knowledge

  • A basic understanding of Spark, simulation and optimization concepts, the retail banking sector, and distributed computing

What you'll learn

  • Explore a Spark-based distributed computing system built to performantly solve complex portfolio optimization while allowing expressive user-defined business requirements
  • Understand challenges, pitfalls to avoid, and solutions


In retail banking, product managers have to regularly optimize their consumer portfolio across products, markets, customer segments, and other dimensions for a range of objective functions. These range from maximizing total revenue over N months across the entire portfolio with the least interest expense to adjusting front and back book pricing to narrowly defined regional and product-level targets. In all use cases, the unit of optimization is the most granular pricing cell where rate is a variable, and the optimization scope can easily involve hundreds of thousands of such pricing cells across multiple geographies, products, and channels. What makes it even more complicated are real-world constraints on those pricing cells that make them interdependent (such as price ordering, lock-step behavior, “frozen” cells, and more).

Kaushik Deka and Ted Gibson share a large-scale optimization architecture in Spark for a consumer product portfolio optimization use case in retail banking. The architecture combines a simulator that distributes computation of complex real-world scenarios and a constraint optimizer that uses business rules as constraints to meet growth targets.

The team faced three challenges in building this solution—finding or creating a declarative language framework that could be interpreted both by the Spark-based simulator and the optimizer, allowing a feedback loop; a model abstraction framework to enable fast optimization of real-world simulations; and a distributed architecture that integrates the simulator and optimizer within an application session

To solve the first problem, they adopted a “rules” framework used to express the varied set of inputs to both the simulator and optimizer. For example, the portfolio simulator uses bank rate rules to represent direct pricing changes, competitor rules to modify the competitive landscape, macrorelated rules to account for the changing rate environment, and more. Even constraints and targets to the optimizer were designed as rules to enable a flexible expression of business goals.

To solve the second problem, they first divided the simulation space into partitions defined by a set of identifiers, which allows independent distributed computing using a Spark-based simulator. Second, they built partition-level “approximation models” for interested metrics by training machine learning models on a training dataset generated from a full simulation run. The optimization space can then be explored by the optimizer using these approximation models, obviating expensive full simulation runs and vastly improving optimization runtime.

To solve the third problem, they designed a job orchestration framework to create a simulator-generated training dataset on the cluster, built approximation models, optimized, resimulated, and ultimately fed optimization results back to the end user application via a real-time Kafka channel.

Photo of Kaushik Deka

Kaushik Deka


Kaushik Deka is a partner and CTO at Novantas, where he is responsible for technology strategy and R&D roadmap of a number of cloud-based platforms. He has more than 15 years’ experience leading large engineering teams to develop scalable, high-performance analytics platforms. Kaushik holds an MS in computer science from the University of Missouri, an MS in engineering from the University of Pennsylvania, and an MS in computational finance from Carnegie Mellon University.

Photo of Ted Gibson

Ted Gibson


Ted Gibson is a product management principal at Novantas Solutions, where he is responsible for content product management for the PriceTek suite of products, focusing on business use cases, metrics, models, and calculations for innovative new development. In his more than eight years working on PriceTek, Ted has held various roles across product management, sales, client services, and engineering and has experience in pricing for consumer deposits, home equity, mortgage, auto, and unsecured lending. He holds a BA in applied mathematics from Yale University.