Predicting Criteo’s Internet traffic load using Bayesian structural time series model.
Who is this presentation for?Anyone interested in time series, forecasting and machine learning
At Criteo, we connect 1.5 billion active shoppers with the things they need and love. Our technology takes an algorithmic approach to predict what user we show an ad to, when, and for what products.
Criteo’s infrastructure evolution is driven by our traffic forecast. Our Infrastructure provides capacity and connectivity to host the Criteo platform and applications. Located in 6 different countries across Americas, Europe and Asia our footprint covers: 9 datacenters, 2 HPC clusters, more than 35K physical servers ,+5M QPS on peak hours.
Due to its critical importance, one of principal tasks of the Product Data Science team is to build machine learning models to forecast traffic demand across services and datacenters to make good investment decisions to scale our infrastructure. This allows us to accurately build predictions of how many machines any service will need in the future with stunning accuracy.
Predicting capacity is especially useful to allocate hardware needs for periods when the traffic load is really high, i.e., during Black Friday, Cyber Monday or Christmas sales in Americas and Europe.
To accomplish this task, we make use Bayesian state space models to forecast daily traffic load several months in advance. The statistical Bayesian framework in contrast to classical econometric or classical time series models allows us to infer time-varying components present in the time series, like local trends, local seasonalities, capture especial events/holidays in a hierarchical way, or simply induce sparsity in the model, etc. . The Bayesian treatment also allows us to include domain knowledge in the form of priors distributions in a flexible way. This modelling approach has proven to be very valuable for us when there is not enough data available to train our models. Over the last two years, these extremes periods have been predicted six months in advance very well by our models, with an error lower than 6% (MAPE).
In this tutorial, Hamlet Jesse Medina Ruiz explains how to forecast Criteo’s traffic load using Bayesian dynamic time series models. In the talk, he details the general Bayesian framework, it’s advantages and limitations, and some alternatives to solve the problem
Prerequisite knowledgeA basic understanding of machine learning and time series concepts.
What you'll learn
Hamlet Jesse Medina Ruiz
Hamlet is a Senior Data Scientist at Criteo. Previously he was working as a Control System Engineer for Petróleos de Venezuela. Hamlet has finished in the top ranking in multiple data science competitions, including the prestigious 4th place on predicting return’s volatility on the NY stock exchange hosted by College de France and Capital Fund Management in 2018, and the 25th place on predicting stock returns hosted by G-Research also in 2018. Hamlet holds a two master degrees on Mathematics and Machine Learning from Pierre and Marie Curie University – Paris 6, and a Ph.D. in Applied Mathematics from University Paris-Sud – Paris 11 in France, where he focused on statistical signal processing and machine learning.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts