Deep learning at scale: Tools and solutions





Who is this presentation for?
- DL developers, ML engineers
- Data science managers
- Researchers
Level
IntermediateDescription
Building a sophisticated and successful deep learning (DL) practice involves far more than installing frameworks such as TensorFlow or PyTorch then developing and deploying models. As DL research teams grow and model complexity increases, a new set of challenges begin to mount.
Research teams will find themselves needing to share GPU infrastructure efficiently, over which they’ll train many models, tune hyperparameters, explore many neural network architectures, and exploit parallel and distributed training techniques to speed up training time. They’ll grow to depend on an effective model lifecycle and metadata management system to ensure reproducibility of results and foster collaboration between researchers within and across teams. They’ll be asked to improve the inference performance of their DL models, particularly for resource-constrained mobile and edge deployments. Tackling these challenges typically requires extensive research and engineering talent.
Angela Wu, Sidney Wijngaarde, Shiyuan Zhu and Vishnu Mohan provide you with an overview of these challenges, present state-of-the-art solutions, and discuss popular software and tools with a focus on hyperparameter tuning, distributed training, and model serving. You’ll work through hands-on examples of how to solve practical DL challenges and walk away with an understanding of best practices and tools to smooth your organization’s adoption of DL.
Prerequisite knowledge
- Basic knowledge of deep learning
Materials or downloads needed in advance
A WiFi-enabled laptop (You will be provided with access to a compute environment and computational resources on a cloud platform; the cluster address will be distributed at the doorway when you enter the room.)
To interact with the cluster, you will need a browser.
Additionally, to submit jobs to the cluster and more comprehensive feature support, you will need to download and install a command-line interface (CLI). Please download materials and install the CLI in advance, as the conference WiFi bandwidth is limited. Instructions are available in our Github repository; we assume you have a laptop running macOS or Linux. This is recommended but *not* required to gain value from our tutorial.
What you'll learn
- Learn why GPU scheduling is hard
- Learn why DL workloads often behave differently between repeated runs and gain keys for making DL workloads reproducible
- Learn about hyperparameter tuning and state-of-the-art algorithms, popular tools for distributed training, and challenges and TensorFlow serving for DL deployment

Angela Wu
Determined AI
Angela Wu is a software engineer at Determined AI where she solves deep learning problems for leading organizations through Determined AI’s cutting-edge software. In a past life, Angela was a mathematician dabbling in property testing, list decoding, voting theory, and fast Fourier transforms. She holds a BA from Swarthmore College and a joint PhD in mathematics and computer science from the University of Chicago.

Sidney Wijngaarde
Determined AI
Sidney Wijngaarde is a software engineer at Determined AI, where he works closely with leading organizations to help them successfully apply deep learning using Determined AI’s cutting-edge software. Previously, Sidney worked on hybrid and multi-cloud management at IBM. He holds a BA from Dartmouth College.

Shiyuan Zhu
Determined AI
Shiyuan Zhu is a software engineer at Determined AI, where he helps build end-to-end product for ML/DL developers in leading organizations to efficiently realize their ideas. Previously, Shiyuan was involved in the products on machine learning, data mining, and full stack. Shiyuan holds a MSc in electrical engineering from the University of Southern California.

Vishnu Mohan
Determined AI
Vishnu Mohan is a director of product management at Determined AI, where he works closely with customers to improve the productivity of their research teams on their deep learning initiatives. Previously, he was a director of product at Mesosphere. He holds an MS in computer science from the University of Texas at Dallas.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Diversity and Inclusion Sponsor
Impact Sponsors
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Hi Dovydas,
Slides are up on both the OReilly conference page and on github.
Hi, thanks for the great workshop. Can we get the slides?
Hi Prarthana,
Yes, should be okay. Below are instructions for installing pedl:
conda create
n pedl python=3 -yq-py35.py36.py37-none-any.whlconda activate pedl
pip install -U //pedl
It’s hard for us to say without more information why the pip install wasn’t working for you. Might be worth trying installing on Miniconda, below are instructions to install that:
mkdir -p $HOME/downloads/miniconda
cd $HOME/downloads/miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh -bu
$HOME/miniconda3/bin/conda init
logout
Hello,
I am using anaconda python version 3.7.3. Will it be ok for this training? Also, the given pip install command for CLI is not working for anaconda. Could you please help me with this as well.