Deep learning enables state-of-the-art performance in domains such as computer vision, NLP, and speech recognition and has great potential value to industry. Deep learning is tightly connected with big data: models need be trained with massive amounts of data, the majority of which is video, audio, and text, which deep learning algorithms excel at processing.
The industry has already built a rich big data ecosystem, from distributed data storage to high-velocity streaming systems to processing engines. Apache Spark is a well-known, fast engine for big data processing. It provides a completed framework to unify different big data workloads (SQL, streaming, machine learning, etc.). Many big data applications have already been built on these systems.
Yiheng Wang and Jennie Wang offer an overview of BigDL, a distributed deep learning framework built for big data platforms using Apache Spark. BigDL combines the benefits of high-performance computing and big data architecture, providing native support for deep learning functionalities in Spark, orders of magnitude speed-up over out-of-the-box open source DL frameworks (e.g., Caffe and Torch) for single node performance (by leveraging Intel MKL), and the scale-out of deep learning workloads based on the Spark architecture.
Yiheng and Jennie introduce the functionality of BigDL, demonstrate how to develop with it, and share some practical use cases, such as image recognition, object detection, and NLP, that allow users to use their big data platform (e.g., Apache Hadoop and Spark) as a unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.
Yiheng Wang is a software development engineer on the Big Data Technology team at Intel working in the area of big data analytics. Yiheng and his colleagues are developing and optimizing distributed machine learning algorithms (e.g., neural network and logistic regression) on Apache Spark. He also helps Intel customers build and optimize their big data analytics applications.
Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org