Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Distributed training of deep learning models

Mathew Salvaris (Microsoft), Miguel Gonzalez-Fierro (Microsoft), Ilia Karmanov (Microsoft)

11:15–11:55 Wednesday, 23 May 2018

Data science and machine learning
Location: Capital Suite 13 Level: Advanced

Average rating:

(4.00, 5 ratings)

Download slides (PDF)

Who is this presentation for?

Data scientists and AI researchers

Prerequisite knowledge

A basic understanding of deep learning, Docker, and Python

What you'll learn

Compare the distributed training performance of two deep learning frameworks on available cloud hardware
Explore tips and pitfalls of distributed training with CNTK, TensorFlow (Horovod), PyTorch, MxNet, and Chainer

Description

In the last year, there have been a number of attempts to train deep CNNs on the ImageNet dataset in the shortest time possible. (The most recent attempt managed to do it in 15 minutes.) But all of these attempts happen on custom clusters, which are out of the reach of most data scientists.

One of the key advantages of the cloud is being able to scale out compute resources as required. Mathew Salvaris, Miguel Gonzalez-Fierro, and Ilia Karmanov offer a comparison of two platforms for running distributed deep learning training in the cloud. Both utilize Docker containers, making it possible to run any deep learning framework on them. You’ll examine the performance of each as the number of nodes scales and learn some tips and tricks as well as some pitfalls to watch out for.

The first is a service called Batch AI, which uses the Azure Batch infrastructure to easily run deep learning jobs at scale across GPUs. The second is an open source toolkit that allows data scientists to spin up clusters in a turnkey fashion. It utilizes Kubernetes and Grafana for easy job scheduling and monitoring. This solution has been used in daily production for Microsoft internal groups. Mathew, Miguel, and Ilia use these training platforms to train a ResNet network on the ImageNet dataset using each of the following frameworks: CNTK, TensorFlow (Horovod), PyTorch, MxNet, and Chainer. They then compare and contrast the performance. The examples presented can also be used as templates for your own deep learning problems.

Mathew Salvaris

Microsoft

Mathew Salvaris is a data scientist at Microsoft. Previously, Mathew was a data scientist for a small startup that provided analytics for fund managers; a postdoctoral researcher at UCL’s Institute of Cognitive Neuroscience, where he worked with Patrick Haggard in the area of volition and free will, devising models to decode human decisions in real time from the motor cortex using electroencephalography (EEG); and a postdoc in the University of Essex’s Brain Computer Interface Group, where he worked on BCIs for computer mouse control. Mathew holds a PhD in brain-computer interfaces and an MSc in distributed artificial intelligence.

Miguel Gonzalez-Fierro

Microsoft

Miguel González-Fierro is a senior data scientist at Microsoft UK, where he helps customers leverage their processes using big data and machine learning. Previously, he was CEO and founder of Samsamia Technologies, a company that created a visual search engine for fashion items, allowing users to find products using images instead of words, and founder of the Robotics Society of Universidad Carlos III, which developed different projects related to UAVs, mobile robots, small humanoids competitions, and 3D printers. Miguel also worked as a robotics scientist at Universidad Carlos III of Madrid and King’s College London, where his research focused on learning from demonstration, reinforcement learning, computer vision, and dynamic control of humanoid robots. He holds a BSc and MSc in electrical engineering and an MSc and PhD in robotics.

Website

Ilia Karmanov

Microsoft

Ilia Karmanov is a data scientist working on applying machine learning and deep learning solutions in industry. He is particularly interested in the statistical theory behind deep learning. Ilia holds an MSc in economics from the London School of Economics.

Comments on this page are now closed.

Comments

Miguel Gonzalez-Fierro | SR. DATA SCIENTIST

24/05/2018 10:56 BST

Here you can find the slides of the presentation: https://www.slideshare.net/MiguelFierro1/distributed-training-of-deep-learning-models

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com