Reliable, high-scale TensorFlow inference pipelines at Twitter
Who is this presentation for?
- ML engineers
Twitter heavily relies on Scala and JVM and contains a lot of expertise knowledge. For instance, Twitter built Finagle for low-latency client and server RPCs, Heron for real-time data processing, and Scalding for offline use cases (Hadoop and Spark). In comparison, the ML world is focused on the Python and C++ stack.
To provide a reliable TensorFlow inference offering to the Twitter customer teams, the company has had to overcome a few problems along the way. Shajan Dasan and Briac Marcatté lead a deep dive into specific performance issues Twitter has dealt with and show you how the company handled them and built the right toolkit set to mitigate potential future issues. Twitter has a particular emphasis on observability, catching performance issues early through automatic performance regression analysis on key metrics (CPU usage, memory usage, latency, and throughput). Shajan and Briac also share why you should care about what you optimize for (throughput versus latency, for instance) and why you should think early about your SLAs before working on a new model. All of these aspects helped Twitter successfully serve 50+ different models in production, serving 1M+ requests per second. You’ll leave with a better understanding of the choices Twitter made along the way to create a reliable JVM-based inference pipeline.
- A basic understanding of TensorFlow production pipelines
What you'll learn
- Understand the choices Twitter made to create a reliable JVM-based inference pipeline
Shajan Dasan is a staff machine learning engineer at Twitter, where he works on the company’s prediction service, enabling different services to perform high-scale inference. Previously, he built distributed systems for information retrieval (web crawler and indexer for Bing), data storage (video, photo, and large-object store at Twitter), and video transcoding (video backend at Twitter) and worked on the first version of C# language, where he implemented the type safety verifier.
Briac Marcatté is a staff machine learning engineer at Twitter.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires