Training deep neural network models requires a highly tuned system with the right combination of software, drivers, compute, memory, network, and storage resources. Deep learning frameworks like TensorFlow, PyTorch, Caffe, Torch, Theano, and MXNet have contributed to the popularity of deep learning by reducing the effort and skill needed to design, train, and use deep learning models. Fabric for Deep Learning (FfDL, pronounced “fiddle”) provides a consistent way to run these deep learning frameworks as a service on Kubernetes. FfDL uses a microservices architecture to reduce coupling between components, keep each component simple and as stateless as possible, isolate component failures, and allow each component to be developed, tested, deployed, scaled, and upgraded independently.
Animesh Singh, Atin Sood, and Tommy Li share lessons learned while building and using FfDL and demonstrate how to leverage it to execute distributed deep learning training for models written using multiple frameworks, using GPUs and object storage constructs. They then explain how to take models from IBM’s Model Asset Exchange, train them using FfDL, and deploy them on Kubernetes for serving and inferencing.
This session is sponsored by IBM.
Animesh Singh is an STSM and lead for IBM Watson and Cloud Platform, where he leads machine learning and deep learning initiatives on IBM Cloud and works with communities and customers to design and implement deep learning, machine learning, and cloud computing frameworks. He has a proven track record of driving design and implementation of private and public cloud solutions from concept to production. In his decade-plus at IBM, Animesh has worked on cutting-edge projects for IBM enterprise customers in the telco, banking, and healthcare Industries, particularly focusing on cloud and virtualization technologies, and led the design and development first IBM public cloud offering.
Atin Sood is a technical lead at IBM’s Watson Studio. For the last 10+ years, Atin has been leading technical teams across IBM focusing on scalable distributed systems and scalable machine learning problems.
Tommy Li is a software developer at IBM focusing on cloud, container, and infrastructure technology. He has worked on various developer journeys, which provide use cases on cloud-computing solutions, such as Kubernetes, microservices, and hybrid cloud deployments. He is passionate about machine learning and big data.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org