Deep Learning has shown significant promise towards model performance, however any DL technique may require large volumes of data , in the absence of which DL models can quickly become untenable particularly when data size falls short of problem space. This situation is seen regularly while training RNNs RNNs can quickly memorize and over-fit when data size is small to medium.
However on the other hand Bayesian techniques (particularly Bayesian Networks) are more robust to missing data, noise and data-size but it lacks order or sequence information. The idea here is to combine the strength of two modeling techniques and harness the power of RNNs at the expense of data-size. The presentation exposes shortcomings of RNNs and how a combination of RNNs and Bayesian Network (PGM) can not only overcome this shortcoming but also improvise sequence-modeling behavior of RNNs.
For the business context, We will learn this in the context of Marketing Channel Attribution modeling. While attempting to attribute credits to a channel, its important to take Channel interactions , number of impressions of channel on the leads and the order in which the channel was touched in a lead’s journey, into account.
First each lead’s journey or path is processed through Bayesian Nets , as we know Bayesian Nets have the ability to produce posterior distribution , this posterior distribution can then be trained along side/with RNN architecture stacked LSTMs / GRU architecture to capture the effectiveness of the order in which the channels are touched for the marketing campaign.
However, since the posterior distribution is composed of positive and negative cases, we introduce a hyper parameter for regularization such that tuning of the same can best segregate positive and negative distributions. The length of the sequence (channel-touches) need to be trimmed such that the combined architecture will effectively generalize the order/ sequence impact on the attribution.
The combined trained architecture can then be used to score each lead (its path journey) and arrive at odds of becoming a client. This technique not only assess effectiveness of a path but also provides optimal points of interception, even at the expense of missing or limited data-size.
Vishal (‘Vish’) Hawa is Principal Data Scientist at Vanguard. Vish has over 15 years of experience in Retail and Financial services industry and works closely with Marketing Managers in designing attribution, propensity and attrition modeling.
Vish has executive management from Wharton school of business, post-graduation degrees in Information sciences, Statistics and computer engineering from Indian Statistical Institute.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com