Vikram Saletore and Luke Wilson discuss a collaboration between SURFSara and Intel as part of the Intel Parallel Computing Center initiative to advance the state of large-scale neural network training on Intel Xeon CPU-based servers. SURFSara and Intel evaluated a number of data and model parallel approaches and synchronous versus asynchronous SGD methods with popular neural networks, such as ResNet50 using large datasets on the TACC (Texas Advanced Computing Center) and Dell HPC supercomputers.
Vikram and Luke share insights on several best-known methods, including CPU core, memory pinning, and hyperparameter tuning, that were developed to demonstrate top-one/top-five state-of-the-art accuracy at scale. They then detail real-world problems that can be solved by utilizing models efficiently trained at large-scale and present tests performed at Dell EMC on CheXNet, a Stanford University project that extends a DenseNet model pretrained on the large-scale ImageNet dataset to detect pathologies in chest X-ray images, including pneumonia. Vikram and Luke highlight improved time to solution on extended training of this pretrained model and the various storage and interconnect options that lead to more efficient scaling.
Vikram Saletore is a principal engineer and a performance architect in the Customer Solutions, Artificial Intelligence Products, and Data Center Groups at Intel. Vikram leads performance optimizations for distributed machine learning (ML) and deep learning (DL) workloads and collaborates with industry enterprise and government partners, OEMs, HPC, and CSP customers on deep learning scale-out training and inference and machine learning analytics on Intel architectures. Vikram is also a technical coprincipal investigator for distributed deep learning research with European members of Intel’s Parallel Computing Center. Vikram has 25+ years of experience and has led many data center initiatives. As a research scientist with Intel Labs, he led research collaboration with HP Labs. Prior to Intel, Vikram was a tenure-track faculty member in the Computer Science Department at Oregon State University and led NSF-funded research in parallel programming and distributed computing, supervising eight graduate students. He also worked at DEC and AMD. He has many patents and has authored ~45 peer-reviewed research publications. Vikram holds a PhD in EE with a focus on parallel programming and distributed computing from the University of Illinois Urbana-Champaign and an MS from UC Berkeley.
Lucas A. “Luke” Wilson is a data scientist and artificial intelligence researcher in Dell EMC’s HPC and AI Engineering Group, focusing on developing hardware configurations and software solutions for deep learning problems. Previously, he was the director of training and professional development at the Texas Advanced Computing Center (TACC) at the University of Texas at Austin and a member of TACC’s High-Performance Computing Group working on performance profiling and optimization, including early performance optimization work on both the first- and second-generation Intel Xeon Phi processors. Luke has been involved in research and development related to parallel and distributed nature-inspired algorithms for more than 15 years, including using genetic algorithms, artificial immune systems, and artificial neural networks for efficiently solving complex scheduling, categorization, prediction, and design problems. Luke holds a BS and MS in computer science from Texas A&M University-Corpus Christi and a PhD in computer science from the University of Texas at San Antonio.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org