October 28–31, 2019
Please log in

How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

Nicolas kowalski (Criteo), Axel Antoniotti (Criteo)
11:00am11:40am Wednesday, October 30, 2019
Location: Grand Ballroom E
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Machine learning engineers and software engineers with basic knowledge of TensorFlow and TensorFlow Serving

Level

Intermediate

Description

When you access a web page, bidders such as Criteo must determine in a few dozens of milliseconds if they want to purchase the advertising space on the page. At that moment, a real-time auction takes place, and once you remove all the communication exchange delays, it leaves a handful of milliseconds to compute exactly how much to bid. In the past year, Criteo has put a large amount of effort into reshaping its in-house machine learning stack responsible for making such predictions—in particular, opening it to new technologies such as TensorFlow.

Unfortunately, even for simple logistic regression models and small neural networks, Criteo’s initial TensorFlow implementations saw inference time increase by 100, going from 300 microseconds to 30 milliseconds.

Nicolas Kowalski and Axel Antoniotti outline how Criteo approached this issue, discussing how Criteo profiled its model to understand its bottleneck; why commonly shared solutions such as optimizing TensorFlow build for the target hardware, freezing and cleaning up the model, and using accelerated linear algebra (XLA) ended up being lackluster; and how Criteo rewrote is models from scratch, reimplementing cross-features and hashing functions using low-level TF operations in order to factorize as much as possible all TensorFlow nodes in its model.

Prerequisite knowledge

  • A basic understanding of how TensorFlow and TensorFlow Serving work
  • Experience optimizing TensorFlow models for serving (useful but not required)

What you'll learn

  • Understand how to optimize a TensorFlow model before serving it online
  • Discover how to profile a TensorFlow model with a complex preprocessing architecture
  • Learn how and when to replace feature columns with custom cross-features and hashing functions to factorize and drastically reduce the number of nodes in the model
Photo of Nicolas kowalski

Nicolas kowalski

Criteo

Nicolas Kowalski is a senior software engineer at Criteo. His work focuses on developing the platforms and tools that are used by all of Criteo to create any kind of machine learning model, train them, serve them online, and monitor their behavior. Previously, Nicolas earned a PhD in applied mathematics from Paris University Pierre and Marie Curie and spent some time in academia, where he published eight papers in international journals and conferences, including the best paper at the 2012 International Meshing Roundtable.

Photo of Axel Antoniotti

Axel Antoniotti

Criteo

Axel Antoniotti is a staff software engineer at Criteo. His work focuses on developing the platforms and tools that are used by all of Criteo to create any kind of machine learning model, train them, serve them online, and monitor their behavior. He holds an engineering master’s degree from EPITA, a French grande école specialized in computer science.

  • O'Reilly
  • TensorFlow
  • Google Cloud
  • IBM
  • NVIDIA
  • Databricks
  • Tensor Networks
  • VMware
  • Amazon Web Services
  • One Convergence
  • Quantiphi
  • Lambda Labs
  • Tech Mahindra
  • cnvrg.io
  • Determined AI
  • Inferencery
  • Manceps, Inc.
  • PerceptiLabs
  • Valohai

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

sponsorships@oreilly.com

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires