Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Analytics Zoo: Distributed TensorFlow in production on Apache Spark

Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel)
3:50pm4:30pm Thursday, March 28, 2019
Secondary topics:  Data Platforms, Deep Learning
Average rating: **...
(2.67, 3 ratings)

Who is this presentation for?

  • Those working with deep learning or big data

Level

Non-technical

Prerequisite knowledge

  • Familiarity with deep learning and Spark (useful but not required)

What you'll learn

  • Learn how to run TensorFlow training and inference on Apache Spark
  • Explore Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark

Description

Building a model is fun and exciting, but putting it to production is always a different story. While TensorFlow focuses on model building, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management. Apache Spark is a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require complicated deployment steps or error-prone interprocess communication.

Yuhao Yang and Jennie Wang offer an overview of Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environment. It requires minimal or even no deployment effort on vanilla Spark clusters; offers high performance through intraprocess communication and optimized parameter synchronization; provides a rich variety of inference patterns, including low-latency local POJO, high-throughput batching, and streaming; and supplies a variety of reference use cases and preprocessing utilities.

Join Yuhao and Jennie to explore the tech details behind Analytics Zoo and walk through multiple examples that highlight its key capabilities. Along the way, you’ll discover how with a few extra lines of code, an existing TensorFlow algorithm can be transformed into a Spark application and integrated with the big data world.

Photo of Yuhao Yang

Yuhao Yang

Intel

Yuhao Yang is a senior software engineer on the big data team at Intel, where he focuses on deep learning algorithms and applications—particularly distributed deep learning and machine learning solutions for fraud detection, recommendation, speech recognition, and visual perception. He’s also an active contributor to Apache Spark MLlib.

Photo of Jiao(Jennie) Wang

Jiao(Jennie) Wang

Intel

Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.

Jiao(Jennie)Wang是英特尔大数据技术团队的软件工程师,主要工作在大数据分析领域。她致力于基于Apache Spark开发和优化分布式深度学习框架。