Accelerating training, inference, and ML applications on NVIDIA GPUs
Who is this presentation for?
- Researchers and developers who are designing and optimizing deep learning models in TensorFlow
Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio dive into techniques to accelerate deep learning training and inference for common deep learning and machine learning workloads. You’ll learn how DALI can eliminate I/O and data processing bottlenecks in real-world applications and how automatic mixed precision (AMP) can easily give you up to 3x training performance improvement on Volta GPUs. You’ll see best practices for multi-GPU and multinode scaling using Horovod. They use a deep learning profiler to visualize the TensorFlow operations and identify optimization opportunities. And you’ll learn to deploy these trained models using INT8 quantization in TensorRT (TRT), all within new convenient APIs of the TensorFlow framework.
- A working knowledge of TensorFlow
Materials or downloads needed in advance
- Please for best experience be sure to bring a laptop with:
- SSH terminal connection capabilities
- A browser (any browser should be fine)
- NVIDIA NSight tools installed **BEFORE** you arrive onsite.
What you'll learn
- Discover components from NVIDIA’s software stack to speed up pipelines and eliminate I/O bottlenecks
- Learn how to enable mixed precision when training models and use TRT to optimize your trained models for inference
Maggie Zhang is a deep learning software engineer at NVIDIA, where she works on deep learning frameworks. She earned her PhD in computer science and engineering from the University of New South Wales in Australia. Her research background includes GPU and CPU heterogeneous computing, compiler optimization, computer architecture, and deep learning.
Nathan Luehr is a senior developer technology engineer at NVIDIA, where he works to accelerate deep learning frameworks. His background is in theoretical chemistry. He holds a doctoral degree from Stanford University, where he worked to accelerate electronic structure calculations on GPUs.
Josh Romero is a developer technology engineer at NVIDIA. He has extensive experience in GPU computing from porting and optimizing high-performance computing (HPC) applications to more recent work with deep learning. Josh earned his PhD from Stanford University, where his research focused on developing new computational fluid dynamics methods to better exploit GPU hardware.
Pooya Davoodi is a senior software engineer at NVIDIA working on accelerating TensorFlow on NVIDIA GPUs. Previously, Pooya worked on Caffe2, Caffe, CUDNN, and other CUDA libraries.
Davide Onofrio is a senior deep learning software technical marketing engineer at NVIDIA. He’s focused on development and presentation of deep learning technical developer-oriented content at NVIDIA. Davide has several years of experience working as a computer vision and machine learning engineer in biometrics, VR, and the automotive industry. He earned a PhD in signal processing at the Politecnico di Milano.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires