Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Analytics Zoo: Distributed TensorFlow and Keras on Apache Spark

Jason Dai (Intel), Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel), Guoqiong Song (Intel)
1:30pm5:00pm Tuesday, March 26, 2019
Average rating: ***..
(3.00, 6 ratings)

Who is this presentation for?

  • Big data engineers, deep learning engineers, and data scientists



Prerequisite knowledge

  • Familiarity with big data and machine learning

Materials or downloads needed in advance

  • A laptop
  • A GitHub account

What you'll learn

  • Explore emerging deep learning frameworks for big data
  • Learn practical design patterns for distributed systems and algorithms for these frameworks
  • Gain experience using innovative application pipelines and architecture for the new class of deep learning applications on big data platforms


Analytics Zoo provides a unified analytics and AI platform that seamlessly unites Spark, TensorFlow, Keras, and BigDL programs into an integrated pipeline. The entire pipeline can then transparently scale out to a large Hadoop/Spark cluster for distributed training or inference.

Jason Dai, Yuhao Yang, Jennie Wang, and Guoqiong Song explain how to build and productionize deep learning applications for big data (transfer learning-based image classification, sequence-to-sequence prediction for precipitation nowcasting, neural collaborative filtering for recommendations, unsupervised time series anomaly detection, etc.) with Analytics Zoo, using real-world use cases from, MLSListings, the World Bank, Baosight, and Midea/KUKA.

Photo of Jason Dai

Jason Dai


Jason Dai is a senior principal engineer and chief architect for big data technologies at Intel, where he leads the development of advanced big data analytics, including distributed machine learning and deep learning. Jason is an internationally recognized expert on big data, the cloud, and distributed machine learning; he’s the cochair of the Strata Data Conference in Beijing, a committer and PMC member of the Apache Spark project, and the creator of BigDL, a distributed deep learning framework on Apache Spark.

Photo of Yuhao Yang

Yuhao Yang


Yuhao Yang is a senior software engineer on the big data team at Intel, where he focuses on deep learning algorithms and applications—particularly distributed deep learning and machine learning solutions for fraud detection, recommendation, speech recognition, and visual perception. He’s also an active contributor to Apache Spark MLlib.

Photo of Jiao(Jennie) Wang

Jiao(Jennie) Wang


Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.

Jiao(Jennie)Wang是英特尔大数据技术团队的软件工程师,主要工作在大数据分析领域。她致力于基于Apache Spark开发和优化分布式深度学习框架。

Photo of Guoqiong Song

Guoqiong Song


Guoqiong Song is a senior deep learning software engineer on the big data technology team at Intel. She’s interested in developing and optimizing distributed deep learning algorithms on Spark. She holds a PhD in atmospheric and oceanic sciences with a focus on numerical modeling and optimization from UCLA.

Guoqiong Song是英特尔大数据技术团队的高级深度学习软件工程师。 她拥有加州大学洛杉矶分校的大气和海洋科学博士学位,专业方向是数值建模和优化。 她现在的研究兴趣是开发和优化分布式深度学习算法。

Comments on this page are now closed.


Picture of Jiao(Jennie) Wang
04/05/2019 6:07am PDT

Analytics Zoo can support TF training/fine tune/inference on Spark. If you have TF face recognition model, you can try to use Analytics Zoo to run on spark.

04/05/2019 5:36am PDT

I attended this tutorial. Can you send me a copy of your slides. My email is
Thanks, James Wang

01/28/2019 10:57pm PST

So I am specifically interested in the integration of GPU/HPC tech underneath the Spark to speed up TF and DL tech in general. Specifically working on how to stream process for things like face recognition on the fly in spark, overlain by tensorflow using DL face recognition tech.