Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Scaling the AI hierarchy of needs with TensorFlow, Spark, and Hops

Jim Dowling (Logical Clocks)
14:0514:45 Thursday, 24 May 2018
Data engineering and architecture
Location: Capital Suite 13 Level: Beginner
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • CTOs, data architects, data scientists, and data engineers

Prerequisite knowledge

  • A basic understanding of machine learning or Hadoop

What you'll learn

  • Explore distributed deep learning with TensorFlow, Spark, and Hadoop


State-of-the-art deep learning systems at hyperscale AI companies attack the toughest problems with distributed deep learning. Distributed deep learning systems help both AI researchers and practitioners be more productive and enable the training of models that would be intractable on a single GPU server. Hadoop provides the needed platform support for distributed deep learning with TensorFlow and Spark and offers a unified feature store and resource management platform for GPUs.

Jim Dowling explores recent developments in supporting distributed deep learning on Hadoop—in particular, Hops, a distribution of Hadoop with support for distributed metadata. Jim discusses the need for better support for Python and GPUs as a resource and demonstrates how to build a feature store with Hive, Kafka, and Spark. Jim also explains why on-premises distributed deep learning is gaining traction and how commodity GPUs provide lower-cost access to massive amounts of GPU resources.

Distributed deep learning can both massively reduce training time and parallel experimentation, using large-scale hyperparameter optimization. Jim offers an overview of recent transformative open source TensorFlow frameworks that leverage Apache Spark to manage distributed training, such as Yahoo’s TensorFlowOnSpark, Uber’s Horovod platform, and Hops’s tfspark, which reduce training time as well as neural network development time through parallel experimentation on different models across hundreds of GPUs, as is typically done in hyperparameter sweeps.

Photo of Jim Dowling

Jim Dowling

Logical Clocks

Jim Dowling is the CEO of Logical Clocks, an associate professor at KTH Royal Institute of Technology in Stockholm, and lead architect of Hopsworks, an open source data and AI platform. He’s a regular speaker at big data industry conferences. He holds a PhD in distributed systems from Trinity College Dublin.