Over the course of a visit to a website, each user generates a trace of web pages visited and actions (clicks) taken. The trace is a sequential series of observations that can provide insight about the latent intention of the visitor, and how that intention may evolve over the course of the visit. A valuable goal in online marketing is to identify and forecast the intention of the visitor while the visitor is on the site. The earlier in the session a system can identify the likely future behavior of the visitor, the more opportunities exist to affect that behavior.
There is no shortage of machine learning packages that propose to address the ubiquitous “big data” challenge. However, there is often a significant gap between what is available out-of-the-box and the challenges posed by real-world data. This case study describes the integration of a data-driven modeling approach into a real-time analytics platform. We provide a description of the distributed processing framework and describe how our model leverages it to produce its predictions in real time. Our case study will also address some common challenges in applying machine learning solutions to real-world data, such as scalability, noisy data, and accuracy.
We will provide a light treatment on probabilistic models in order to provide context for the output of our predictive system. The emphasis of our case-study will be on the integration of our predictive system into an existing, distributed architecture. In addition, we discuss the monitoring and long-term maintenance of a data-driven solution in a dynamic, online setting. We conclude with a demonstration of how our real-time predictions can be used to drive external systems such as content generation, segmentation, or retargeting tools.
Ethan Dereszynski is a research scientist in machine learning and data mining at Webtrends. He earned his PhD in computer science at Oregon State University in 2012. Ethan’s research focuses on the application of machine learning, in particular Bayesian statistics and probabilistic models, to challenging problems across multiple disciplines. Combining work and play, Ethan is also interested in unsupervised approaches for learning models of player behavior in real-time strategy games (read: accepting all challengers at StarCraft). Prior to studying at Oregon State, he received a B.S. in computer science at Alma College, where he minored in Mathematics and English.
Eric leads the Technology team at Webtrends, chartered with creating creative disruptive solutions in the analytics and optimization space. His team is responsible for the newly introduced Webtrends Streams product, among other patent-pending technologies recently released by Webtrends.
Eric has held technology leadership positions at Webtrends, Jive Software and Intel in his career. He is a runner, follows his kids around playing soccer, and lives in Portland with his family.
For exhibition and sponsorship opportunities, contact Sharon Cordesse at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences contact email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of OSCON contacts