Deep learning (DL) creates impactful advances following a virtuous recipe: a model architecture search, creating large training datasets, and scaling computation. Joel Hestness discusses research done by Baidu Research’s Silicon Valley AI Lab on new model architectures and features for speech recognition (Deep Speech 3), speech generation (Deep Voice 3), and natural language processing. To deploy these models in impactful products, the lab wants to grow datasets and compute scale to drive accuracy improvements. Large-scale empirical studies provided intriguing results about how deep learning is likely to scale: As training set size increases, DL model generalization error and model sizes scale as particular power-law relationships. As model size grows, training time remains roughly constant—larger models require fewer steps to converge to the same accuracy. Joel explains how these scaling relationships help accurately predict the expected accuracy and training time for models trained on larger datasets.
Joel Hestness is a systems research scientist at Baidu Research Silicon Valley AI Lab (SVAIL). He studies the scaling characteristics of machine and deep learning applications and techniques to scale out model training runs on large-scale clusters. His prior research focused on general-purpose GPU microarchitecture and memory hierarchies to improve programmability, performance, and energy efficiency in heterogeneous processors. Joel contributes to gem5-gpu, gem5, and TensorFlow. He holds a PhD in computer architecture from the University of Wisconsin-Madison.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com