Put AI to work
June 26-27, 2017: Training
June 27-29, 2017: Tutorials & Conference
New York, NY

BigDL: Distributed deep learning on Apache Spark

Yiheng Wang (Intel), Jiao(Jennie) Wang (Intel)
1:30pm5:00pm Tuesday, June 27, 2017
Implementing AI
Location: Sutton Center Level: Beginner
Secondary topics:  Deep Learning
Average rating: **...
(2.50, 2 ratings)

Prerequisite Knowledge

  • A basic understanding of deep learning
  • A working knowledge of Apache Spark

Materials or downloads needed in advance

  • A laptop with the course materials downloaded from the GitHub repo

What you'll learn

  • Explore BigDL, a distributed deep learning framework built for big data platforms using Apache Spark


Deep learning enables state-of-the-art performance in domains such as computer vision, NLP, and speech recognition and has great potential value to industry. Deep learning is tightly connected with big data: models need be trained with massive amounts of data, the majority of which is video, audio, and text, which deep learning algorithms excel at processing.

The industry has already built a rich big data ecosystem, from distributed data storage to high-velocity streaming systems to processing engines. Apache Spark is a well-known, fast engine for big data processing. It provides a completed framework to unify different big data workloads (SQL, streaming, machine learning, etc.). Many big data applications have already been built on these systems.

Yiheng Wang and Jennie Wang offer an overview of BigDL, a distributed deep learning framework built for big data platforms using Apache Spark. BigDL combines the benefits of high-performance computing and big data architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speed-up over out-of-the-box open source DL frameworks (e.g., Caffe and Torch) for single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture.

Yiheng and Jennie introduce the functionality of BigDL, demonstrate how to develop with it, and share some practical use cases, such as image recognition, object detection, and NLP, that allow users to use their big data platform (e.g., Apache Hadoop and Spark) as a unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.

Photo of Yiheng Wang

Yiheng Wang


Yiheng Wang is a software development engineer on the Big Data Technology team at Intel working in the area of big data analytics. Yiheng and his colleagues are developing and optimizing distributed machine learning algorithms (e.g., neural network and logistic regression) on Apache Spark. He also helps Intel customers build and optimize their big data analytics applications.

Photo of Jiao(Jennie) Wang

Jiao(Jennie) Wang


Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.

Jiao(Jennie)Wang是英特尔大数据技术团队的软件工程师,主要工作在大数据分析领域。她致力于基于Apache Spark开发和优化分布式深度学习框架。

Comments on this page are now closed.


Picture of Yiheng Wang
06/27/2017 6:13am EDT

The github repo has been updated.

06/26/2017 10:34am EDT

The Github repo is currently empty, please advise.