State-of-the-art deep learning systems at hyperscale AI companies attack the toughest problems with distributed deep learning. Distributed deep learning systems help both AI researchers and practitioners be more productive and enable the training of models that would be intractable on a single GPU server. Hadoop provides the needed platform support for distributed deep learning with TensorFlow and Spark and offers a unified feature store and resource management platform for GPUs.
Jim Dowling explores recent developments in supporting distributed deep learning on Hadoop—in particular, Hops, a distribution of Hadoop with support for distributed metadata. Jim discusses the need for better support for Python and GPUs as a resource and demonstrates how to build a feature store with Hive, Kafka, and Spark. Jim also explains why on-premises distributed deep learning is gaining traction and how commodity GPUs provide lower-cost access to massive amounts of GPU resources.
Distributed deep learning can both massively reduce training time and parallel experimentation, using large-scale hyperparameter optimization. Jim offers an overview of recent transformative open source TensorFlow frameworks that leverage Apache Spark to manage distributed training, such as Yahoo’s TensorFlowOnSpark, Uber’s Horovod platform, and Hops’s tfspark, which reduce training time as well as neural network development time through parallel experimentation on different models across hundreds of GPUs, as is typically done in hyperparameter sweeps.
Jim Dowling is the CEO of Logical Clocks, the makers of Hops Hadoop, an associate professor at KTH Royal Institute of Technology in Stockholm, and a senior researcher at SICS RISE. Previously, Jim worked at MySQL. A distributed systems researcher, Jim focuses on large-scale distributed systems and machine learning. He is lead architect of Hops Hadoop, the world’s most scalable Hadoop distribution, and teaches the first and largest course in Sweden on deep learning. Jim is also an O’Reilly blogger on AI and a regular speaker at big data industry conferences. He holds a PhD in distributed systems from Trinity College Dublin.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com