Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

BIDMach on Spark: Machine learning at the outer limits

John Canny (UC Berkeley)
11:30am–12:00pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Average rating: ****.
(4.07, 14 ratings)

Prerequisite knowledge

Attendees should have an understanding of basic networking concepts, including familiarity with terms such as latency, B/W, and simple network topologies, as well as facility with basic algebra (matrix arithmetic) and gradient methods.

Description

GPUs have proven their value for machine learning, offering orders-of-magnitude speedups on dense and sparse data. They define the current performance limits for machine learning but have limited model capacity. John Canny explains how to mitigate that challenge and achieve linear speedups with GPUs on commodity networks. The result defines the hitherto unseen “outer limits” of ML performance.

BIDMach is a rooflined machine-learning toolkit that has demonstrated a two-orders-of-magnitude gain over other systems when run on GPU hardware. BIDMach often outperforms cluster systems on problems where models fit in GPU memory and defines the performance limit for small-model problems. But there is great demand in industry for larger models, and BIDMach has migrated to the Spark ecosystem to boost model capacity. Scaling BIDMach is not an easy task, however. With two-orders-of-magnitude more computation at each node, a proportionate increase in network bandwidth is needed to avoid stifling performance. Or we need to find radical ways to reduce the load on the network.

John describes two techniques that each give us a roughly order-of-magnitude reduction in network load for common ML algorithms on power-law data. Together they allow us to mitigate network load and exploit GPU performance fully for both data-parallel and model-parallel calculations on commodity clusters. John builds on his earlier work on layered, heterogeneous hypercube networks (Kylix) to add topology-aware optimization and an extremely efficient error-tolerance protocol. BIDMach’s single node performance has also increased dramatically over the last year through the design of end-to-end learning algorithms. These algorithms—word2vec is one example—exploit GPU’s massive register storage (several megabytes) to provide dense computation speeds (600 Gflops) on sparse data problems.

John summarizes some other developments on BIDMach including:

  • A native deep learning package for feed-forward and recurrent deep networks
  • Integration of Berkeley’s Caffe DNN framework using JavaCPP
  • BIDMach-on-Android: High-performance machine learning for mobile devices
Photo of John Canny

John Canny

UC Berkeley

John F. Canny is a computer scientist and the Paul and Stacy Jacobs Distinguished Professor of Engineering in the Computer Science Department of the University of California, Berkeley. John has made significant contributions in various areas of computer science and mathematics, including artificial intelligence, robotics, computer graphics, human-computer interaction, computer security, computational algebra, and computational geometry.