Sep 23–26, 2019

Predicting Criteo’s Internet traffic load using Bayesian structural time series model.

4:35pm5:15pm Wednesday, September 25, 2019
Location: 1A 12/14
Secondary topics:  Media and Advertising, Temporal data and time-series analytics

Who is this presentation for?

Anyone interested in time series, forecasting and machine learning

Level

Intermediate

Description

At Criteo, we connect 1.5 billion active shoppers with the things they need and love. Our technology takes an algorithmic approach to predict what user we show an ad to, when, and for what products.

Criteo’s infrastructure evolution is driven by our traffic forecast. Our Infrastructure provides capacity and connectivity to host the Criteo platform and applications. Located in 6 different countries across Americas, Europe and Asia our footprint covers: 9 datacenters, 2 HPC clusters, more than 35K physical servers ,+5M QPS on peak hours.

Due to its critical importance, one of principal tasks of the Product Data Science team is to build machine learning models to forecast traffic demand across services and datacenters to make good investment decisions to scale our infrastructure. This allows us to accurately build predictions of how many machines any service will need in the future with stunning accuracy.

Predicting capacity is especially useful to allocate hardware needs for periods when the traffic load is really high, i.e., during Black Friday, Cyber Monday or Christmas sales in Americas and Europe.

To accomplish this task, we make use Bayesian state space models to forecast daily traffic load several months in advance. The statistical Bayesian framework in contrast to classical econometric or classical time series models allows us to infer time-varying components present in the time series, like local trends, local seasonalities, capture especial events/holidays in a hierarchical way, or simply induce sparsity in the model, etc. . The Bayesian treatment also allows us to include domain knowledge in the form of priors distributions in a flexible way. This modelling approach has proven to be very valuable for us when there is not enough data available to train our models. Over the last two years, these extremes periods have been predicted six months in advance very well by our models, with an error lower than 6% (MAPE).

In this tutorial, Hamlet Jesse Medina Ruiz explains how to forecast Criteo’s traffic load using Bayesian dynamic time series models. In the talk, he details the general Bayesian framework, it’s advantages and limitations, and some alternatives to solve the problem

Prerequisite knowledge

A basic understanding of machine learning and time series concepts.

What you'll learn

Learn how to analyze time series using Bayesian modelling, in particular how to make a good forecast by including uncertainty in your estimates.
Photo of Hamlet  Jesse Medina Ruiz

Hamlet Jesse Medina Ruiz

Criteo

Hamlet is a Senior Data Scientist at Criteo. Previously he was working as a Control System Engineer for Petróleos de Venezuela. Hamlet has finished in the top ranking in multiple data science competitions, including the prestigious 4th place on predicting return’s volatility on the NY stock exchange hosted by College de France and Capital Fund Management in 2018, and the 25th place on predicting stock returns hosted by G-Research also in 2018. Hamlet holds a two master degrees on Mathematics and Machine Learning from Pierre and Marie Curie University – Paris 6, and a Ph.D. in Applied Mathematics from University Paris-Sud – Paris 11 in France, where he focused on statistical signal processing and machine learning.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts