Reliable, high-scale TensorFlow inference pipelines at Twitter

Shajan Dasan (Twitter), Briac Marcatté (Twitter)

1:40pm–2:20pm Wednesday, October 30, 2019

Location: Grand Ballroom C/D

Production pipelines

Average rating:

(2.00, 1 rating)

Who is this presentation for?

ML engineers

Level

Intermediate

Description

Twitter heavily relies on Scala and JVM and contains a lot of expertise knowledge. For instance, Twitter built Finagle for low-latency client and server RPCs, Heron for real-time data processing, and Scalding for offline use cases (Hadoop and Spark). In comparison, the ML world is focused on the Python and C++ stack.

To provide a reliable TensorFlow inference offering to the Twitter customer teams, the company has had to overcome a few problems along the way. Shajan Dasan and Briac Marcatté lead a deep dive into specific performance issues Twitter has dealt with and show you how the company handled them and built the right toolkit set to mitigate potential future issues. Twitter has a particular emphasis on observability, catching performance issues early through automatic performance regression analysis on key metrics (CPU usage, memory usage, latency, and throughput). Shajan and Briac also share why you should care about what you optimize for (throughput versus latency, for instance) and why you should think early about your SLAs before working on a new model. All of these aspects helped Twitter successfully serve 50+ different models in production, serving 1M+ requests per second. You’ll leave with a better understanding of the choices Twitter made along the way to create a reliable JVM-based inference pipeline.

Prerequisite knowledge

A basic understanding of TensorFlow production pipelines

What you'll learn

Understand the choices Twitter made to create a reliable JVM-based inference pipeline

Shajan Dasan

Twitter

Shajan Dasan is a staff machine learning engineer at Twitter, where he works on the company’s prediction service, enabling different services to perform high-scale inference. Previously, he built distributed systems for information retrieval (web crawler and indexer for Bing), data storage (video, photo, and large-object store at Twitter), and video transcoding (video backend at Twitter) and worked on the first version of C# language, where he implemented the type safety verifier.