Sep 9–12, 2019

Deep learning at scale: Tools and solutions

Angela Wu (Determined AI), Sidney Wijngaarde (Determined AI), Shiyuan Zhu (Determined AI), Vishnu Mohan (Determined AI)
1:30pm5:00pm Tuesday, September 10, 2019
Location: LL21 A/B
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • DL developers, ML engineers
  • Data science managers
  • Researchers




Building a sophisticated and successful deep learning (DL) practice involves far more than installing frameworks such as TensorFlow or PyTorch then developing and deploying models. As DL research teams grow and model complexity increases, a new set of challenges begin to mount.

Research teams will find themselves needing to share GPU infrastructure efficiently, over which they’ll train many models, tune hyperparameters, explore many neural network architectures, and exploit parallel and distributed training techniques to speed up training time. They’ll grow to depend on an effective model lifecycle and metadata management system to ensure reproducibility of results and foster collaboration between researchers within and across teams. They’ll be asked to improve the inference performance of their DL models, particularly for resource-constrained mobile and edge deployments. Tackling these challenges typically requires extensive research and engineering talent.

Angela Wu, Sidney Wijngaarde, Shiyuan Zhu and Vishnu Mohan provide you with an overview of these challenges, present state-of-the-art solutions, and discuss popular software and tools with a focus on hyperparameter tuning, distributed training, and model serving. You’ll work through hands-on examples of how to solve practical DL challenges and walk away with an understanding of best practices and tools to smooth your organization’s adoption of DL.

Prerequisite knowledge

  • Basic knowledge of deep learning

Materials or downloads needed in advance

A WiFi-enabled laptop (You will be provided with access to a compute environment and computational resources on a cloud platform; the cluster address will be distributed at the doorway when you enter the room.)

To interact with the cluster, you will need a browser.

Additionally, to submit jobs to the cluster and more comprehensive feature support, you will need to download and install a command-line interface (CLI). Please download materials and install the CLI in advance, as the conference WiFi bandwidth is limited. Instructions are available in our Github repository; we assume you have a laptop running macOS or Linux. This is recommended but *not* required to gain value from our tutorial.

What you'll learn

  • Learn why GPU scheduling is hard
  • Learn why DL workloads often behave differently between repeated runs and gain keys for making DL workloads reproducible
  • Learn about hyperparameter tuning and state-of-the-art algorithms, popular tools for distributed training, and challenges and TensorFlow serving for DL deployment
Photo of Angela Wu

Angela Wu

Determined AI

Angela Wu is a software engineer at Determined AI where she solves deep learning problems for leading organizations through Determined AI’s cutting-edge software. In a past life, Angela was a mathematician dabbling in property testing, list decoding, voting theory, and fast Fourier transforms. She holds a BA from Swarthmore College and a joint PhD in mathematics and computer science from the University of Chicago.

Photo of Sidney Wijngaarde

Sidney Wijngaarde

Determined AI

Sidney Wijngaarde is a software engineer at Determined AI, where he works closely with leading organizations to help them successfully apply deep learning using Determined AI’s cutting-edge software. Previously, Sidney worked on hybrid and multi-cloud management at IBM. He holds a BA from Dartmouth College.

Photo of Shiyuan Zhu

Shiyuan Zhu

Determined AI

Shiyuan Zhu is a software engineer at Determined AI, where he helps build end-to-end product for ML/DL developers in leading organizations to efficiently realize their ideas. Previously, Shiyuan was involved in the products on machine learning, data mining, and full stack. Shiyuan holds a MSc in electrical engineering from the University of Southern California.

Photo of Vishnu Mohan

Vishnu Mohan

Determined AI

Vishnu Mohan is a director of product management at Determined AI, where he works closely with customers to improve the productivity of their research teams on their deep learning initiatives. Previously, he was a director of product at Mesosphere. He holds an MS in computer science from the University of Texas at Dallas.

Comments on this page are now closed.


Picture of Angela Wu
Angela Wu | Software Engineer
09/13/2019 1:46am PDT

Hi Dovydas,
Slides are up on both the OReilly conference page and on github.

Dovydas Ceilutka |
09/11/2019 2:41am PDT

Hi, thanks for the great workshop. Can we get the slides?

Picture of Angela Wu
Angela Wu | Software Engineer
09/10/2019 3:47am PDT

Hi Prarthana,

Yes, should be okay. Below are instructions for installing pedl:

conda create n pedl python=3 -yq
conda activate pedl
pip install -U //pedl

It’s hard for us to say without more information why the pip install wasn’t working for you. Might be worth trying installing on Miniconda, below are instructions to install that:

mkdir -p $HOME/downloads/miniconda
cd $HOME/downloads/miniconda
sh -bu
$HOME/miniconda3/bin/conda init

Prarthana Shah | Data Science Engineer
09/09/2019 7:37am PDT

I am using anaconda python version 3.7.3. Will it be ok for this training? Also, the given pip install command for CLI is not working for anaconda. Could you please help me with this as well.

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires