Integrating deep learning accelerators with TensorFlow

Sudipta Sengupta (AWS)

1:40pm–2:20pm Wednesday, October 30, 2019

Location: Grand Ballroom H

Accelerators

Average rating:

(2.00, 1 rating)

Who is this presentation for?

Deep learning engineers, TensorFlow developers, ML system architects, and hardware-software codesigners

Level

Intermediate

Description

Deep learning computation uses mixed-precision linear algebra and can benefit from specialized accelerators, such as NVIDIA’s GPU, Google’s tensor processing unit (TPU), and AWS’s Inferentia, to optimize performance, cost, and power. These accelerators need to be integrated with deep learning frameworks and exposed via existing programming interfaces within the frameworks so that developers and data scientists can leverage hardware acceleration with minimal changes to their existing model pipeline code.

Sudipta Sengupta dives into his experience with integrating Amazon Elastic Inference and AWS Inferentia with TensorFlow in the AWS cloud. Accelerators often come with runtimes that optimize model execution by performing whole graph optimization during compilation—ahead of time (AOT) or just in time (JIT). These runtimes support only a subset of the operators available in the framework while expanding operator coverage over time.

Sudipta builds upon and extends TensorFlow mechanisms to use a sub-graph-level interface to the respective accelerator runtimes in the AWS ecosystem, while respecting restricted operator coverage of the accelerator runtime and resource constraints of individual accelerators. The optimizations built into TensorFlow determine efficient ways to run the arithmetic operations in the model and distribute graph execution intelligently across the accelerator(s) and CPU on the instance. This is achieved by analyzing the computation needs of each model along with the accelerator and CPU resources that are available and optimizing the placement of operators across host and accelerator(s).

Going forward, there needs to be standard and generic mechanisms in TensorFlow for plugging in accelerator runtimes, with flexibility to customize for operator coverage, graph partitioning, support multiple devices and multiple compute elements within same device, and compilation mode (AOT versus JIT).

Prerequisite knowledge

A basic understanding of deep learning training and inference, hardware acceleration, TensorFlow programming interfaces, and design internals

What you'll learn

Discover basic ideas for integrating deep learning accelerators with TensorFlow
Identify concepts for TensorFlow model graph, operators, tensors, graph execution, hardware acceleration, and model compilation
Learn techniques for graph partitioning, optimization, and cross-device execution

Sudipta Sengupta

AWS

Sudipta Sengupta is a senior principal technologist and director at AWS, where he leads new initiatives in artificial intelligence and deep learning. Previously, he headed an end-to-end innovation agenda at Microsoft Research, spanning cloud networking, storage, and data management; was at Bell Labs working on internet routing, optical switching, network security, wireless networks, and network coding. He has shipped his research in many industry-leading, award-winning products and services. Sudipta is an ACM fellow and an IEEE fellow. He was awarded the IEEE William R. Bennett Prize and the IEEE Leonard G. Abraham Prize for his work on computer networking. Sudipta holds a PhD and an MS in EECS from MIT and a BTech in computer science and engineering from the Indian Institute of Technology, Kanpur, India. He was awarded the President of India Gold Medal at IIT-Kanpur for graduating at the top of his class across all disciplines.