Deep reinforcement learning has been a huge success in various gaming environments, but it’s still very hard to apply to the enterprise world. In a recent blog post, “Deep reinforcement learning does not work yet,” Google Brain’s Alex Ipran highlights some of the hard problems still to be solved. One of the challenges people often face when trying to apply deep reinforcement learning to enterprise applications is how to evaluate and measure a trained deep reinforcement learning model and determine whether it is good for production deployment and serving.
Unlike the gaming environments used in many research papers, in an enterprise production environment, online evaluation of ML models might be very expensive or simply impractical; people often have to do offline evaluation with simulated environments. In enterprise environments, sometimes there is often no known global optimal policy for data scientists and ML engineers to refer to, so it is very difficult to measure how good the trained deep reinforcement learning model is. As a result, people often must find a reasonable ML model as a baseline model for comparison and evaluation. In addition, reinforcement learning/deep reinforcement learning is designed to maximize long-term rewards, and even sometimes produces short-term losses if necessary to achieve high long-term optimized rewards, so evaluating deep reinforcement learning models usually requires execution over some period of time to verify the optimized long-term results.
Jian Wu discusses an end-to-end engineering project to train and evaluate deep Q-learning models for targeting sequential marketing campaigns using the 10-fold cross-validation method. Jian covers data processing, building an unbiased simulator based on collected campaign data, and creating 10-fold training and testing datasets. Jian also explains how to evaluate trained DQN models with neural network-based baseline models and shows that trained deep Q-learning models produce better-optimized long-term rewards for the majority of 10 testing datasets.
Jian Wu is a full stack engineer at NIO US. Before joining NIO, he was a data analytics developer at Samsung SDSRA’s AI and Machine Learning Lab working on machine learning projects using Kubernetes, Python, and TensorFlow, he also developed Web UI and Dashboard using JavaScript with AngularJS, D3.js, and Bootstrap. Jian has been a software developer in the San Francisco Bay Area for 20+ years, he developed a device gateway and a REST-style WS server when working at Netflix, developed a payment gateway when working at eBay/PayPal, and worked at Oracle for 8+ years developing Java middle-tier server and applications.
For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of AI contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com