October 28–31, 2019
Please log in

Accelerating training, inference, and ML applications on NVIDIA GPUs

Maggie Zhang (NVIDIA), Nathan Luehr (NVIDIA), Josh Romero (NVIDIA), Pooya Davoodi (NVIDIA), Davide Onofrio (NVIDIA)
1:30pm5:00pm Tuesday, October 29, 2019
Location: Grand Ballroom E
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Researchers and developers who are designing and optimizing deep learning models in TensorFlow




Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio dive into techniques to accelerate deep learning training and inference for common deep learning and machine learning workloads. You’ll learn how DALI can eliminate I/O and data processing bottlenecks in real-world applications and how automatic mixed precision (AMP) can easily give you up to 3x training performance improvement on Volta GPUs. You’ll see best practices for multi-GPU and multinode scaling using Horovod. They use a deep learning profiler to visualize the TensorFlow operations and identify optimization opportunities. And you’ll learn to deploy these trained models using INT8 quantization in TensorRT (TRT), all within new convenient APIs of the TensorFlow framework.

Prerequisite knowledge

  • A working knowledge of TensorFlow

Materials or downloads needed in advance

  • Please for best experience be sure to bring a laptop with:
  • SSH terminal connection capabilities
  • A browser (any browser should be fine)
  • NVIDIA NSight tools installed **BEFORE** you arrive onsite.

What you'll learn

  • Discover components from NVIDIA’s software stack to speed up pipelines and eliminate I/O bottlenecks
  • Learn how to enable mixed precision when training models and use TRT to optimize your trained models for inference
Photo of Maggie Zhang

Maggie Zhang


Maggie Zhang is a deep learning software engineer at NVIDIA, where she works on deep learning frameworks. She earned her PhD in computer science and engineering from the University of New South Wales in Australia. Her research background includes GPU and CPU heterogeneous computing, compiler optimization, computer architecture, and deep learning.

Photo of Nathan Luehr

Nathan Luehr


Nathan Luehr is a senior developer technology engineer at NVIDIA, where he works to accelerate deep learning frameworks. His background is in theoretical chemistry. He holds a doctoral degree from Stanford University, where he worked to accelerate electronic structure calculations on GPUs.

Photo of Josh Romero

Josh Romero


Josh Romero is a developer technology engineer at NVIDIA. He has extensive experience in GPU computing from porting and optimizing high-performance computing (HPC) applications to more recent work with deep learning. Josh earned his PhD from Stanford University, where his research focused on developing new computational fluid dynamics methods to better exploit GPU hardware.

Photo of Pooya  Davoodi

Pooya Davoodi


Pooya Davoodi is a senior software engineer at NVIDIA working on accelerating TensorFlow on NVIDIA GPUs. Previously, Pooya worked on Caffe2, Caffe, CUDNN, and other CUDA libraries.

Photo of Davide Onofrio

Davide Onofrio


Davide Onofrio is a senior deep learning software technical marketing engineer at NVIDIA. He’s focused on development and presentation of deep learning technical developer-oriented content at NVIDIA. Davide has several years of experience working as a computer vision and machine learning engineer in biometrics, VR, and the automotive industry. He earned a PhD in signal processing at the Politecnico di Milano.

Comments on this page are now closed.


Picture of Davide Onofrio
Davide Onofrio | Senior Deep Learning Software Technical Marketing Engineer
11/07/2019 3:41am PST

We removed the link but we uploaded the slides to the conference website. It should be already available.

Thanks for your comment.

Picture of Leonardo Apolonio
Leonardo Apolonio | Principal Machine Learning Engineer
11/06/2019 10:57pm PST

The link to slides made available during the presentation doesn’t work. http://bit.ly/331RRAs Is there another way to get the slides?

Picture of Davide Onofrio
Davide Onofrio | Senior Deep Learning Software Technical Marketing Engineer
10/24/2019 6:51am PDT

About the requirements:
– We will SSH into a remote VM from a terminal window
– We will connect the browser to run python notebooks

So the OS should not matter provided you can SSH to a remote machine.

We removed all the installation requirements so the NVIDIA Nsight Systems is not needed on your laptop.

Thanks for leaving a comment.

Raimondas Lencevicius | Principal Research Engineer
10/24/2019 5:46am PDT

When you say “NVIDIA NSight tools”, do you mean NVIDIA NSight Systems or other NVIDIA NSight tools (NSight Graphics? NSight Compute? others?).

Also, do you expect the tutorial work to be done on Windows? Linux? Either? It is not clear from the description if/what Windows/Linux specific software might be needed.


  • O'Reilly
  • TensorFlow
  • Google Cloud
  • IBM
  • Databricks
  • Tensor Networks
  • VMware
  • Amazon Web Services
  • One Convergence
  • Quantiphi
  • Lambda Labs
  • Tech Mahindra
  • cnvrg.io
  • Determined AI
  • Inferencery
  • Manceps, Inc.
  • PerceptiLabs
  • Valohai

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires