Put open source to work
July 16–17, 2018: Training & Tutorials
July 18–19, 2018: Conference
Portland, OR

Deploy and use a multiframework distributed deep learning platform on Kubernetes (sponsored by IBM)

Animesh Singh (IBM), ATIN SOOD (IBM), Tommy Li (IBM)
11:00am11:40am Wednesday, July 18, 2018
Kubernetes, Sponsored
Location: E147/148

What you'll learn

  • Learn how to use Fabric for Deep Learning (FfDL) to execute distributed deep learning training for models written using multiple frameworks

Description

Training deep neural network models requires a highly tuned system with the right combination of software, drivers, compute, memory, network, and storage resources. Deep learning frameworks like TensorFlow, PyTorch, Caffe, Torch, Theano, and MXNet have contributed to the popularity of deep learning by reducing the effort and skill needed to design, train, and use deep learning models. Fabric for Deep Learning (FfDL, pronounced “fiddle”) provides a consistent way to run these deep learning frameworks as a service on Kubernetes. FfDL uses a microservices architecture to reduce coupling between components, keep each component simple and as stateless as possible, isolate component failures, and allow each component to be developed, tested, deployed, scaled, and upgraded independently.

Animesh Singh, Atin Sood, and Tommy Li share lessons learned while building and using FfDL and demonstrate how to leverage it to execute distributed deep learning training for models written using multiple frameworks, using GPUs and object storage constructs. They then explain how to take models from IBM’s Model Asset Exchange, train them using FfDL, and deploy them on Kubernetes for serving and inferencing.

This session is sponsored by IBM.

Photo of Animesh Singh

Animesh Singh

IBM

Animesh Singh is a senior technical staff member (STSM) and program director for the IBM Watson and Cloud Platform, where he leads machine learning and deep learning initiatives on IBM Cloud and works with communities and customers to design and implement deep learning, machine learning, and cloud computing frameworks. He has a proven track record of driving design and implementation of private and public cloud solutions from concept to production. Animesh has worked on cutting-edge projects for IBM enterprise customers in the telco, banking, and healthcare industries, particularly focusing on cloud and virtualization technologies, and led the design and development first IBM public cloud offering.

Photo of ATIN SOOD

ATIN SOOD

IBM

Atin Sood is a technical lead at IBM’s Watson Studio. For the last 10+ years, Atin has been leading technical teams across IBM focusing on scalable distributed systems and scalable machine learning problems.

Photo of Tommy Li

Tommy Li

IBM

Tommy Li is a software developer at IBM focusing on cloud, container, and infrastructure technology. He’s worked on various developer journeys that provide use cases on cloud-computing solutions, such as Kubernetes, microservices, and hybrid cloud deployments. He’s passionate about machine learning and big data.