Presented By O’Reilly and Intel AI
Put AI to work
Sep 4-5, 2018: Training
Sep 5-7, 2018: Tutorials & Conference
San Francisco, CA

Your deep learning applications want scale (and how you can support them)

Joel Hestness (Baidu)
1:45pm-2:25pm Friday, September 7, 2018
Location: Yosemite BC
Secondary topics:  Platforms and infrastructure, Text, Language, and Speech

What you'll learn

  • Explore the results of research done by Baidu Research's Silicon Valley AI Lab on new model architectures and features for speech recognition (Deep Speech 3), speech generation (Deep Voice 3), and natural language processing

Description

Deep learning (DL) creates impactful advances following a virtuous recipe: a model architecture search, creating large training datasets, and scaling computation. Joel Hestness discusses research done by Baidu Research’s Silicon Valley AI Lab on new model architectures and features for speech recognition (Deep Speech 3), speech generation (Deep Voice 3), and natural language processing. To deploy these models in impactful products, the lab wants to grow datasets and compute scale to drive accuracy improvements. Large-scale empirical studies provided intriguing results about how deep learning is likely to scale: As training set size increases, DL model generalization error and model sizes scale as particular power-law relationships. As model size grows, training time remains roughly constant—larger models require fewer steps to converge to the same accuracy. Joel explains how these scaling relationships help accurately predict the expected accuracy and training time for models trained on larger datasets.

Photo of Joel Hestness

Joel Hestness

Baidu

Joel Hestness is a systems research scientist at Baidu Research Silicon Valley AI Lab (SVAIL). He studies the scaling characteristics of machine and deep learning applications and techniques to scale out model training runs on large-scale clusters. His prior research focused on general-purpose GPU microarchitecture and memory hierarchies to improve programmability, performance, and energy efficiency in heterogeneous processors. Joel contributes to gem5-gpu, gem5, and TensorFlow. He holds a PhD in computer architecture from the University of Wisconsin-Madison.