Building a model is fun and exciting, but putting it to production is always a different story. While TensorFlow focuses on model building, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management. Apache Spark is a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require complicated deployment steps or error-prone interprocess communication.
Yuhao Yang and Jennie Wang offer an overview of Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environment. It requires minimal or even no deployment effort on vanilla Spark clusters; offers high performance through intraprocess communication and optimized parameter synchronization; provides a rich variety of inference patterns, including low-latency local POJO, high-throughput batching, and streaming; and supplies a variety of reference use cases and preprocessing utilities.
Join Yuhao and Jennie to explore the tech details behind Analytics Zoo and walk through multiple examples that highlight its key capabilities. Along the way, you’ll discover how with a few extra lines of code, an existing TensorFlow algorithm can be transformed into a Spark application and integrated with the big data world.
Yuhao Yang is a senior software engineer on the big data team at Intel, where he focuses on deep learning algorithms and applications—particularly distributed deep learning and machine learning solutions for fraud detection, recommendation, speech recognition, and visual perception. He’s also an active contributor to Apache Spark MLlib.
Jiao (Jennie) Wang is a software engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org