Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Predicting customer lifetime value for a subscription-based business

Chao Zhong (Microsoft)
1:50pm2:30pm Thursday, March 16, 2017
Data science & advanced analytics
Location: LL21 E/F Level: Advanced
Secondary topics:  Hardcore Data Science, Media
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data scientists and those interested in customer lifetime value

Prerequisite knowledge

  • A basic understanding of machine learning and Bayesian analysis (useful but not required)

What you'll learn

  • Learn how to predict customer value and identify high-value customers early on, based on the few key features (R, F, M)
  • Appreciate the value and flexibility of the CLV analysis in real-life applications
  • Learn how to choose the right model based on business and data, how to make accurate CLV predictions using the Fader model, and how to apply the Bayesian approach to real-life problems


Conceptually straight forward but practically difficult to handle, customer lifetime value (LTV) is a well-known form of long term analysis with a wide range of applications, from marketing and finance, to engineering and sales. Conventional approaches find themselves not capable of satisfying the following design requirements for Azure cloud-computing customer lifetime value prediction:

  1. Must be able to make long term prediction based on limited data
  2. Must be able to obtain high accuracy at both individual and group level
  3. Must be able to scale well to different projection windows
  4. Must be able to handle both contract and noncontract customers
  5. Must be able to handle multiple azure products with very different consumption patterns

Chao Zhong offers an overview of a new predictive model for customer lifetime value (LTV) in a cloud-computing business. This model is also the first known application of the Fader RFM approach to a cloud business (based on the 2010 Fader-Hardie-Shang paper that describes the discrete purchase decision-making process of noncontract customers)—a Bayesian approach that predicts a customer’s LTV by making and updating assumptions on the distributions of customers’ latent variables. So far the model has achieved an overall symmetric absolute percentage error (SAPE) of 3% over an out-of-time testing dataset, with a minimum train:test ratio of 5:81.

This model is only the first phase of a long-term plan. Chao ends by briefly sharing the next phases of the model, which will be dynamic (allowing the product and population to change) and interactive (proving causality and providing prescriptive analysis).

Photo of Chao Zhong

Chao Zhong


Chao Zhong is a senior data scientist at C+E Analytics and Insights within Microsoft. His current research interests include (deep) machine learning for customer journey and customer lifetime value and (deep) reinforcement learning for interactive customer behavior modeling. Previously, Chao was the lead data scientist at Scopely, a mobile gaming company in LA. Chao was an ABD (all but dissertation) PhD candidate in mathematics at Michigan Technological University. He holds an MS degree in financial engineering from Temple University and a BS degree in computer science from Beijing University of Aeronautics and Astronautics.

Comments on this page are now closed.


Wilmer Masterson | DATA SCIENTIST
03/21/2017 4:25am PDT

Do you happen to have the slides?