San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Analytics Zoo: Distributed TensorFlow in production on Apache Spark

Yuhao Yang (Intel), Jiao(Jennie) Wang (Intel)

3:50pm–4:30pm Thursday, March 28, 2019

Data Science, Machine Learning & AI
Location: 2016

Secondary topics: Data Platforms, Deep Learning

Average rating:

(2.67, 3 ratings)

Download slides (PDF)

Who is this presentation for?

Those working with deep learning or big data

Level

Non-technical

Prerequisite knowledge

Familiarity with deep learning and Spark (useful but not required)

What you'll learn

Learn how to run TensorFlow training and inference on Apache Spark
Explore Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark

Description

Building a model is fun and exciting, but putting it to production is always a different story. While TensorFlow focuses on model building, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management. Apache Spark is a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require complicated deployment steps or error-prone interprocess communication.

Yuhao Yang and Jennie Wang offer an overview of Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environment. It requires minimal or even no deployment effort on vanilla Spark clusters; offers high performance through intraprocess communication and optimized parameter synchronization; provides a rich variety of inference patterns, including low-latency local POJO, high-throughput batching, and streaming; and supplies a variety of reference use cases and preprocessing utilities.

Join Yuhao and Jennie to explore the tech details behind Analytics Zoo and walk through multiple examples that highlight its key capabilities. Along the way, you’ll discover how with a few extra lines of code, an existing TensorFlow algorithm can be transformed into a Spark application and integrated with the big data world.

Yuhao Yang

Intel

Yuhao Yang is a senior software engineer on the big data team at Intel, where he focuses on deep learning algorithms and applications—particularly distributed deep learning and machine learning solutions for fraud detection, recommendation, speech recognition, and visual perception. He’s also an active contributor to Apache Spark MLlib.

Website

Jiao(Jennie) Wang

Intel

Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.

Jiao（Jennie）Wang是英特尔大数据技术团队的软件工程师，主要工作在大数据分析领域。她致力于基于Apache Spark开发和优化分布式深度学习框架。

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com