Presented By O’Reilly and Intel AI

Beijing • New York • San Francisco • London

Put AI to work

Sep 4-5, 2018: Training

Sep 5-7, 2018: Tutorials & Conference

San Francisco, CA

Evaluate deep Q-learning for sequential targeted marketing with 10-fold cross-validation

Jian Wu (NIO)

2:35pm-3:15pm Thursday, September 6, 2018

Implementing AI
Location: Continental 1-3

Secondary topics: Reinforcement Learning, Temporal data and time-series

Download slides (PDF)

Who is this presentation for?

AI and machine learning engineers

Prerequisite knowledge

A basic understanding of machine learning and model evaluation

What you'll learn

Learn how to train and evaluate deep Q-learning models for enterprise applications and deploy trained models in production

Description

Deep reinforcement learning has been a huge success in various gaming environments, but it’s still very hard to apply to the enterprise world. In a recent blog post, “Deep reinforcement learning does not work yet,” Google Brain’s Alex Ipran highlights some of the hard problems still to be solved. One of the challenges people often face when trying to apply deep reinforcement learning to enterprise applications is how to evaluate and measure a trained deep reinforcement learning model and determine whether it is good for production deployment and serving.

Unlike the gaming environments used in many research papers, in an enterprise production environment, online evaluation of ML models might be very expensive or simply impractical; people often have to do offline evaluation with simulated environments. In enterprise environments, sometimes there is often no known global optimal policy for data scientists and ML engineers to refer to, so it is very difficult to measure how good the trained deep reinforcement learning model is. As a result, people often must find a reasonable ML model as a baseline model for comparison and evaluation. In addition, reinforcement learning/deep reinforcement learning is designed to maximize long-term rewards, and even sometimes produces short-term losses if necessary to achieve high long-term optimized rewards, so evaluating deep reinforcement learning models usually requires execution over some period of time to verify the optimized long-term results.

Jian Wu discusses an end-to-end engineering project to train and evaluate deep Q-learning models for targeting sequential marketing campaigns using the 10-fold cross-validation method. Jian covers data processing, building an unbiased simulator based on collected campaign data, and creating 10-fold training and testing datasets. Jian also explains how to evaluate trained DQN models with neural network-based baseline models and shows that trained deep Q-learning models produce better-optimized long-term rewards for the majority of 10 testing datasets.

Jian Wu

NIO

Jian Wu is a full stack engineer at NIO US. Before joining NIO, he was a data analytics developer at Samsung SDSRA’s AI and Machine Learning Lab working on machine learning projects using Kubernetes, Python, and TensorFlow, he also developed Web UI and Dashboard using JavaScript with AngularJS, D3.js, and Bootstrap. Jian has been a software developer in the San Francisco Bay Area for 20+ years, he developed a device gateway and a REST-style WS server when working at Netflix, developed a payment gateway when working at eBay/PayPal, and worked at Oracle for 8+ years developing Java middle-tier server and applications.

Website

Presented by

Elite Sponsors

Strategic Sponsors

Knowledge Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of AI contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com