Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Nanda Vijaydev (BlueData), Thomas Phelan (HPE BlueData)
12:0512:45 Thursday, 24 May 2018
Secondary topics:  Managing and Deploying Machine Learning
Average rating: ****.
(4.17, 6 ratings)

Who is this presentation for?

  • Data scientists and developers

Prerequisite knowledge

  • A basic understanding of Docker and big data

What you'll learn

  • Explore data science use cases with GPUs
  • Learn the pros and cons of GPUs on containers and Docker in general


Organizations these days understand the need to keep pace with newer technologies and methodologies when it comes to doing data science with machine learning and deep learning. Data volume and complexity is increasing by the day, so it’s imperative that companies understand their business better and stay on par with or ahead of the competition. Thanks to products such as Apache Spark, H2O, and TensorFlow, these organizations no longer have to lock in to a specific vendor technology or proprietary solutions. These rich deep learning applications are available in the open source community, with many different algorithms and options for various use cases.

However, one of the biggest challenges is how to get all these open source tools up and running in an easy and consistent manner (keeping in mind that some of them have OS kernel and software components). For example, TensorFlow can leverage NVIDIA GPU resources, but installing TensorFlow for GPU requires users to setup NVIDIA CUDA libraries on the machine and install and configure the TensorFlow application to make use of the GPU computing ability. The combination of device drivers, libraries, and software versions can be daunting and may be a nonstarter for many users.

Since GPUs are a premium resource, organizations that want to leverage this capability need to bring up clusters with these resources on-demand and then relinquish their use after computation is complete. Docker containers can be used to set up this instant cluster provisioning and deprovisioning and can help ensure reproducible builds and easier deployment.

Nanda Vijaydev and Thomas Phelan demonstrate how to deploy a TensorFlow and Spark with NVIDIA CUDA stack on Docker containers in a multitenant environment. Using GPU-based services with Docker containers does require some careful consideration, so Thomas and Nanda share best practices specifically related to the pros and cons of using NVIDIA-Docker versus regular Docker containers, CUDA library usage in Docker containers, Docker run parameters to pass GPU devices to containers, storing results for transient clusters, and integration with Spark.

Topics include:

  • Spinning up and tearing down GPU-enabled TensorFlow and Spark clusters
  • Quota management of GPU resources for better manageability
  • Isolating GPUs to specific clusters to avoid resource conflict
  • Attaching and detaching GPU resources from clusters
  • Transient use of GPUs for the duration of the job
Photo of Nanda Vijaydev

Nanda Vijaydev


Nanda Vijaydev is the lead data scientist and head of solutions at BlueData (now HPE), where she leverages technologies like TensorFlow, H2O, and Spark to build solutions for enterprise machine learning and deep learning use cases. Nanda has more than 10 years of experience in data science and data management. Previously, she worked on data science projects in multiple industries as a principal solutions architect at Silicon Valley Data Science and served as director of solutions engineering at Karmasphere.

Photo of Thomas Phelan

Thomas Phelan

HPE BlueData

Thomas Phelan is cofounder and chief architect of BlueData. Previously, a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially availably 64-bit file system; and an early employee at VMware, a senior staff engineer and a key member of the ESX storage architecture team where he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture as well as led teams working on many key storage initiatives such as the cloud storage gateway and vFlash.