Presented By O’Reilly and Intel AI
Put AI to work
Sep 4-5, 2018: Training
Sep 5-7, 2018: Tutorials & Conference
San Francisco, CA

Evaluate deep Q-learning for sequential targeted marketing with 10-fold cross-validation

Jian Wu (NIO)
2:35pm-3:15pm Thursday, September 6, 2018
Implementing AI
Location: Continental 1-3
Secondary topics:  Reinforcement Learning, Temporal data and time-series

Who is this presentation for?

  • AI and machine learning engineers

Prerequisite knowledge

  • A basic understanding of machine learning and model evaluation

What you'll learn

  • Learn how to train and evaluate deep Q-learning models for enterprise applications and deploy trained models in production


Deep reinforcement learning has been a huge success in various gaming environments, but it’s still very hard to apply to the enterprise world. In a recent blog post, “Deep reinforcement learning does not work yet,” Google Brain’s Alex Ipran highlights some of the hard problems still to be solved. One of the challenges people often face when trying to apply deep reinforcement learning to enterprise applications is how to evaluate and measure a trained deep reinforcement learning model and determine whether it is good for production deployment and serving.

Unlike the gaming environments used in many research papers, in an enterprise production environment, online evaluation of ML models might be very expensive or simply impractical; people often have to do offline evaluation with simulated environments. In enterprise environments, sometimes there is often no known global optimal policy for data scientists and ML engineers to refer to, so it is very difficult to measure how good the trained deep reinforcement learning model is. As a result, people often must find a reasonable ML model as a baseline model for comparison and evaluation. In addition, reinforcement learning/deep reinforcement learning is designed to maximize long-term rewards, and even sometimes produces short-term losses if necessary to achieve high long-term optimized rewards, so evaluating deep reinforcement learning models usually requires execution over some period of time to verify the optimized long-term results.

Jian Wu discusses an end-to-end engineering project to train and evaluate deep Q-learning models for targeting sequential marketing campaigns using the 10-fold cross-validation method. Jian covers data processing, building an unbiased simulator based on collected campaign data, and creating 10-fold training and testing datasets. Jian also explains how to evaluate trained DQN models with neural network-based baseline models and shows that trained deep Q-learning models produce better-optimized long-term rewards for the majority of 10 testing datasets.

Photo of Jian Wu

Jian Wu


Jian Wu is a full stack engineer at NIO US. Before joining NIO, he was a data analytics developer at Samsung SDSRA’s AI and Machine Learning Lab working on machine learning projects using Kubernetes, Python, and TensorFlow, he also developed Web UI and Dashboard using JavaScript with AngularJS, D3.js, and Bootstrap. Jian has been a software developer in the San Francisco Bay Area for 20+ years, he developed a device gateway and a REST-style WS server when working at Netflix, developed a payment gateway when working at eBay/PayPal, and worked at Oracle for 8+ years developing Java middle-tier server and applications.