Deep learning with Horovod and Spark using GPUs and Docker containers

Thomas Phelan (HPE BlueData)

16:00–16:40 Wednesday, 16 October 2019

Location: King's Suite - Sandringham

Implementing AI

Secondary topics: Deep Learning tools, Hardware, Machine Learning tools

Average rating:

(4.50, 2 ratings)

Download slides (PPTX)

Who is this presentation for?

Data scientists and IT administrators

Level

Beginner

Description

Data volume and complexity increases by the day, so it’s imperative that companies understand their business needs in order to stay ahead of their competition. Thanks to AI, ML, and deep learning (DL) projects such as Apache Spark, H2O, TensorFlow, and Horovod, these organizations no longer have to lock in to a specific vendor technology or proprietary solutions to maintain this competitive advantage. These feature-rich, deep learning applications are available directly from the open source community with many different algorithms and options tailored for specific use cases.

One of the biggest challenges for the enterprise is how to deploy these open source tools in an easy and consistent manner (keeping in mind that some of them have operating system kernel and software components). For example, TensorFlow can leverage NVIDIA GPU resources, but running TensorFlow with GPUs requires users to set up NVIDIA CUDA libraries on the host and install and configure the TensorFlow application to make use of the GPU computing facility. The combination of device drivers, libraries, and software versions can be daunting and may end in failure for many users.

Moreover, since GPUs are a premium resource, organizations want to maximize their use. Clusters using these resources need to be configured on demand and freed immediately after computation is complete. Docker containers are ideal for enabling just this sort of instant cluster provisioning and deprovisioning. They also ensure reproducible and consistent deployment.

Thomas Phelan demonstrates how to deploy AI, ML, and DL applications, including Spark, TensorFlow, and Horovod, using GPU hardware acceleration on Docker containers in a secure multitenant environment. The use of GPU-based services within Docker containers does require some careful consideration, so he’ll also explore some best practices.

Prerequisite knowledge

A basic understanding of Docker containers and NVIDIA GPUs (helpful but not required)

What you'll learn

Discover how to spin up and tear down GPU-enabled AI, ML, and DL clusters in Docker containers
Learn about quota management of GPU resources for better manageability, GPU isolation to specific clusters to avoid resource conflict or contention, the dynamic attach and detach of GPU resources from running clusters, and transient use of GPUs for the duration of a job

Thomas Phelan

HPE BlueData

Thomas Phelan is cofounder and chief architect of BlueData. Previously, a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially availably 64-bit file system; and an early employee at VMware, a senior staff engineer and a key member of the ESX storage architecture team where he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture as well as led teams working on many key storage initiatives such as the cloud storage gateway and vFlash.