Artificial intelligence, particularly deep learning, has revolutionized many different applications over the past few years. Research community and the industry is racing toward advancing the field and realizing real-world impact. To help advance deep learning, Baidu released the open source benchmarking tool DeepBench in 2016, which measures performance on deep learning training operations on different hardware.
However, the performance characteristics of inference differ significantly from training. In order to broaden the impact of deep learning, it is important to speed up inference for deep learning algorithms. Improvement in inference times can have a significant impact on user experience in applications using deep learning.
Sharan Narang outlines the challenges in inference for deep learning models and different workloads and performance requirements for various applications. Along the way, Sharan discusses the key differences between inference and training and various techniques used to speed up deep learning inference.
Sharan Narang is a senior researcher on the systems team at Baidu’s Silicon Valley AI Lab (SVAIL), where he leads the effort to benchmark deep learning applications. He released DeepBench in 2016, an open-source benchmark that measures the performance of deep learning workloads. Sharan also focuses on research to improve the performance of deep learning models by reducing their memory and compute requirements. He has explored techniques like pruning neural network weights and reduced precision to achieve this goal. Previously, Sharan worked on next-generation mobile processors at NVIDIA.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com