Integrating deep learning accelerators with TensorFlow
Who is this presentation for?
- Deep learning engineers, TensorFlow developers, ML system architects, and hardware-software codesigners
Deep learning computation uses mixed-precision linear algebra and can benefit from specialized accelerators, such as NVIDIA’s GPU, Google’s tensor processing unit (TPU), and AWS’s Inferentia, to optimize performance, cost, and power. These accelerators need to be integrated with deep learning frameworks and exposed via existing programming interfaces within the frameworks so that developers and data scientists can leverage hardware acceleration with minimal changes to their existing model pipeline code.
Sudipta Sengupta dives into his experience with integrating Amazon Elastic Inference and AWS Inferentia with TensorFlow in the AWS cloud. Accelerators often come with runtimes that optimize model execution by performing whole graph optimization during compilation—ahead of time (AOT) or just in time (JIT). These runtimes support only a subset of the operators available in the framework while expanding operator coverage over time.
Sudipta builds upon and extends TensorFlow mechanisms to use a sub-graph-level interface to the respective accelerator runtimes in the AWS ecosystem, while respecting restricted operator coverage of the accelerator runtime and resource constraints of individual accelerators. The optimizations built into TensorFlow determine efficient ways to run the arithmetic operations in the model and distribute graph execution intelligently across the accelerator(s) and CPU on the instance. This is achieved by analyzing the computation needs of each model along with the accelerator and CPU resources that are available and optimizing the placement of operators across host and accelerator(s).
Going forward, there needs to be standard and generic mechanisms in TensorFlow for plugging in accelerator runtimes, with flexibility to customize for operator coverage, graph partitioning, support multiple devices and multiple compute elements within same device, and compilation mode (AOT versus JIT).
- A basic understanding of deep learning training and inference, hardware acceleration, TensorFlow programming interfaces, and design internals
What you'll learn
- Discover basic ideas for integrating deep learning accelerators with TensorFlow
- Identify concepts for TensorFlow model graph, operators, tensors, graph execution, hardware acceleration, and model compilation
- Learn techniques for graph partitioning, optimization, and cross-device execution
Sudipta Sengupta is a senior principal technologist and director at AWS, where he leads new initiatives in artificial intelligence and deep learning. Previously, he headed an end-to-end innovation agenda at Microsoft Research, spanning cloud networking, storage, and data management; was at Bell Labs working on internet routing, optical switching, network security, wireless networks, and network coding. He has shipped his research in many industry-leading, award-winning products and services. Sudipta is an ACM fellow and an IEEE fellow. He was awarded the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on computer networking. Sudipta holds a PhD and an MS in EECS from MIT and a BTech in computer science and engineering from the Indian Institute of Technology, Kanpur, India. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires