Modern Deep Learning: Tools and Techniques
Who is this presentation for?DL Developers, DL Researchers, ML Engineers, DL Group Managers
Success with deep learning requires understanding more than just TensorFlow or Keras. When organizations first begin to deploy deep learning, we find they are often faced with a similar set of challenges: their DL developers might understand how to train a single model in principle, but can they make deep learning work in practice? Common questions include:
- How can I run DL jobs on a GPU cluster and share the GPU cluster among a team of researchers?
- How can I tune the hyperparameters of my models?
- How can I do distributed training?
- Should I worry about storing my training and validation metrics? How? How can ensure my DL training is reproducible?
- How do I deploy my models to production?
- How can I improve the inference performance of my DL models, particularly for resource-constrained environments like mobile and edge deployments?
Answering each of these questions often requires extensive research. The software tools in these domains are typically highly technical, poorly documented, and hard to interoperate with one another — and the landscape is changing quickly! For most organizations, considerable effort will be required to integrate a collection of narrow technical tools into a comprehensive DL environment.
This tutorial will offer an overview of these challenges, summarize relevant research and state-of-the-art algorithms where appropriate, and discuss popular software tools. Tutorial participants will work through several hands-on examples of how to use the software tools discussed to solve practical DL challenges.
Tutorial attendees should leave the session with a clear understanding that: (A) success in DL is about more than just training a single model with TensorFlow (B) the common pitfalls that organizations face when adopting DL © a summary of best practices and software tools for dealing with these pitfalls.
Prerequisite knowledgeBasic knowledge of deep learning.
Materials or downloads needed in advance
What you'll learnHyperparameter tuning: what is it, state-of-the-art algorithms for how to do it. GPU scheduling: why is it hard, how to deploy DL jobs on Kubernetes. Distributed training: algorithms and popular tools DL Deployment: challenges, TF-Serving. Keys for making DL workloads reproducible Optimizing DL models for resource-constrained environments: concepts and tools.
Neil Conway is the co-founder and CTO of Determined AI, a startup building software to make deep learning developers dramatically more productive. Before founding Determined AI, Neil was a technical lead at Mesosphere and earned a PhD in computer science from UC Berkeley, where he performed research in distributed systems and large-scale data management. He has been a major contributor to several notable open source projects, including Apache Mesos and Postgres.
Yoav Zimmerman is a software engineer at Determined AI, where he works closely with leading organizations to help them apply deep learning successfully using Determined’s cutting-edge software. Prior to Determined AI, Yoav worked on knowledge representation at Google. Yoav holds a B. Sc. from UCLA.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Diversity and Inclusion Sponsor
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of O'Reilly AI contacts