Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Building and deploying real time big data prediction models

Deepak Agrawal (24[7] Inc.)
11:00am–11:40am Thursday, 12/03/2015
Data Science and Advanced Analytics
Location: 321-322 Level: Intermediate
Tags: commerce
Average rating: ****.
(4.00, 6 ratings)

Prerequisite Knowledge

Attendees should have basic understanding of predictive models and basic knowledge of working with big data.


This work describes the use of big data to deliver better online customer support experience. The essential goal of the predictive models is to predict customer intent, such as intent to chat (for obtaining better and quick service), or intent to purchase and/or intent to purchase a specific product (for making a purchase online), intent to add product(s) to shopping cart, and others, while they are still online i.e. still in the system. Such predictions help in proactively solving customer issues.

In the talk we describe the various types of input data, target intents, machine learning models, typical interventions, outcomes, and technologies using such infrastructure as Hadoop, Cassandra, Kafka, R, and Python, and a few home-grown technologies. The predictive models are built using a variety of customer data, including but not limited to web journeys, speech, and chats on different devices.

Depending on the data, we fit a selection of models for the intents of interest. The real-time platform computes the intent likelihood scores, chooses the most likely intent, and a resulting action. The actions, also referred to as “interventions,” range from showing the customer an offer to chat, showing an ad for a specific product with or without an offer, help widgets, etc. These interventions are done automatically and in real time.

The models improve metrics, such as purchase conversion rate (as much as 4x) and customer satisfaction (as much as 1.5x). Main underlying drivers of metric improvements tend to be the quick and satisfactory resolution of the customer issues.

Lessons learned
We made a fair number of mistakes and here we share the lessons learned. First lesson learned was that websites are dynamic and we need to account for that in model building and deployment. Clients update their websites often and models break down as a result of these changes. We learned that we have to discover the changes on our own. We had to build a set of alerts for variables/features distribution changes as a result of website changes. The second lesson we learned was that as models become complex, evaluation times increase exponentially, resulting in unwanted latency. We had to think hard about optimizing model evaluations to reduce computation time to under 100 milliseconds or even lower. Other lessons learned were about the need for a robust A/B testing platform that continuously monitors model performance, and also indicates a right time to retrain the models.

Challenges faced today
There are four challenges we face today with respect to building and deploying these predictive models. First, data sparseness, which usually is driven by a high ratio of new vs. repeat visitors and long gaps in customer visits. Models need large enough data history to make accurate predictions.

The second challenge is the ability to stitch customer data across channels and devices. Customers interact using a variety of devices and channels, each time creating a data point. In some cases, a common customer ID is not available across channels and devices.

The third challenge is the ability to capture, process, and store ever-expanding data streams while maintaining data quality. Real-time pre-processing of streaming data becomes increasingly difficult without a robust and scalable architecture. This area speaks to the need for further innovation in the big data engineering domain, which is currently built on Hadoop, Cassandra, Kafka, and other home-grown technologies.

Lastly, the fourth challenge we face is the ability to build and deploy models at scale. The traditional approach treats each new business as a new model deployment exercise, with manual data model preparation, exploratory data analysis, model training, and model deployment. All of this takes several weeks to deploy. Our challenge is to automate as many steps as possible in the model development, testing, and deployment phases, thereby creating scale.

In conclusion, we describe a big data application to customer experience using state of the art technologies, share the modelling framework, initial successes, lessons learned, and current challenges.

The following is the agenda for the talk:

  1. Extent of customer experience issue problem
  2. A big data analytics application for enhanced online customer experience
  3. Description of data, sources of data, models used, interventions, and outcomes
  4. Description of technology components
  5. Lessons learned and challenges we face today

Deepak Agrawal

24[7] Inc.

Deepak Agrawal is vice president of data sciences at the 24[ 7] Innovation Labs, a part of 24[ 7] Customer Solutions. At 24[ 7] Innovation Labs, he heads the Data Science Group, which does research and development of machine learning algorithms and data engineering of next generation predictive analytics platform using Web, Speech, and Chat data.

Deepak is a known expert in big data, advanced analytics, customer insights, data/text mining, and predictive modelling. He has published in journals including Marketing Science, Journal of Marketing Research, Journal of Retailing, Journal of Direct Marketing, Journal of Promotion Management, and Journal of Indian Business Research in the areas of analytics, data mining, retailing, and franchising. He has 20+ years of data and analytics R&D experience teaching at Krannert School of Management, Purdue University, and working at Microsoft Corporation and at 24[ 7].

He holds a Ph.D. in Business from the Graduate School of Business, Stanford University.