Presented By O’Reilly and Intel Nervana
Put AI to work
September 17-18, 2017: Training
September 18-20, 2017: Tutorials & Conference
San Francisco, CA

Very large-scale distributed deep learning with BigDL

Jason Dai (Intel), Ding Ding (Intel)
2:35pm–3:15pm Tuesday, September 19, 2017
Implementing AI
Location: Imperial B Level: Intermediate
Secondary topics:  Data science and AI, Deep learning, Tools and frameworks

Prerequisite Knowledge

  • A basic understanding of deep learning and Apache Spark

What you'll learn

  • Explore BigDL, a distributed deep learning framework built for big data platforms using Apache Spark

Description

The scale of datasets and models used in deep learning has increased dramatically. Although larger datasets and models can improve the accuracy in many AI applications, they often take much longer to train with a single node. However, compared common big data solutions, it is not very common to distribute the training to large clusters using today’s popular deep learning frameworks. On the one hand, it is often harder to gain access to a large GPU cluster than a big data (Hadoop/Spark) cluster; on the other hand, the lack of convenient facilities in popular DL frameworks for distributed training (e.g., task scheduling and cluster management) impedes practical applications.

Jason Dai and Ding Ding offer an overview of BigDL, an open source distributed deep learning framework built for big data platforms. By leveraging the cluster distribution capabilities in Apache Spark, BigDL successfully unleashes the power of large-scale distributed training in deep learning, providing good performance, efficient scaling on large clusters, and good convergence results. Jason and Ding demonstrate that BigDL scales well on large clusters and has comparable or even better performance than GPUs in many use cases. They also discuss tuning strategies and share their experience with large-scale distributed training in deep learning.

Photo of Jason Dai

Jason Dai

Intel

Jason Dai is a senior principal engineer and chief architect for big data technologies at Intel, where he leads the development of advanced big data analytics, including distributed machine learning and deep learning. Jason is an internationally recognized expert on big data, the cloud, and distributed machine learning; he is the cochair of the Strata Data Conference in Beijing, a committer and PMC member of the Apache Spark project, and the chief architect of BigDL, a distributed deep learning framework on Apache Spark.

Photo of Ding Ding

Ding Ding

Intel

Ding Ding is a software engineer on Intel’s big data technology team, where she works on developing and optimizing distributed machine learning and deep learning algorithms on Apache Spark, focusing particularly on large-scale analytical applications and infrastructure on Spark.