The scale of datasets and models used in deep learning has increased dramatically. Although larger datasets and models can improve the accuracy in many AI applications, they often take much longer to train with a single node. However, compared common big data solutions, it is not very common to distribute the training to large clusters using today’s popular deep learning frameworks. On the one hand, it is often harder to gain access to a large GPU cluster than a big data (Hadoop/Spark) cluster; on the other hand, the lack of convenient facilities in popular DL frameworks for distributed training (e.g., task scheduling and cluster management) impedes practical applications.
Jason Dai and Ding Ding offer an overview of BigDL, an open source distributed deep learning framework built for big data platforms. By leveraging the cluster distribution capabilities in Apache Spark, BigDL successfully unleashes the power of large-scale distributed training in deep learning, providing good performance, efficient scaling on large clusters, and good convergence results. Jason and Ding demonstrate that BigDL scales well on large clusters and has comparable or even better performance than GPUs in many use cases. They also discuss tuning strategies and share their experience with large-scale distributed training in deep learning.
Jason Dai is a senior principal engineer and chief architect for big data technologies at Intel, where he leads the development of advanced big data analytics, including distributed machine learning and deep learning. Jason is an internationally recognized expert on big data, the cloud, and distributed machine learning; he is the cochair of the Strata Data Conference in Beijing, a committer and PMC member of the Apache Spark project, and the chief architect of BigDL, a distributed deep learning framework on Apache Spark.
Ding Ding is a software engineer on Intel’s big data technology team, where she works on developing and optimizing distributed machine learning and deep learning algorithms on Apache Spark, focusing particularly on large-scale analytical applications and infrastructure on Spark.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org