While there has been a lot of recent progress, deep learning presents a very different workload from what systems like Spark are optimized for. In particular, these workloads are often bottlenecked by communication. While the cost of communication between machines can be improved with better hardware, this bottleneck limits the benefit of distributed training in settings like EC2.
Robert Nishihara offers an overview of SparkNet, a system for training deep networks in Spark. Instead of building a new deep learning library in Java or Scala, SparkNet provides a framework that allows Spark users to construct deep networks using existing deep learning libraries (such as Caffe, TensorFlow, or Torch) as a backend. SparkNet gets an order of magnitude speedup from distributed training relative to Caffe on a single GPU, even in the regime in which communication is extremely expensive. Robert also discusses approaches for parallelizing stochastic gradient descent that minimize communication between machines and prevent communication from being a bottleneck.
Robert Nishihara is a fourth-year PhD student working in the University of California, Berkeley, RISELab with Michael Jordan. He works on machine learning, optimization, and artificial intelligence.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.