Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Deep learning on YARN: Running distributed TensorFlow, MXNet, Caffe, and XGBoost on Hadoop clusters

Wangda Tan (Cloudera)

1:10pm–1:50pm Thursday, 09/13/2018

Data engineering and architecture
Location: 1A 10 Level: Intermediate

Secondary topics: Data Platforms, Deep Learning, Model lifecycle management

Average rating:

(4.50, 2 ratings)

Download slides (PPTX)

Who is this presentation for?

Solution engineers, data engineers, infrastructure engineers, and CxOs

Prerequisite knowledge

A basic understanding of YARN and deep learning frameworks

What you'll learn

Learn how to easily run applications such as TensorFlow, MXNet, Caffe, and XGBoost on YARN

Description

Deep learning is useful for enterprises tasks such as speech recognition, image classification, AI chatbots, and machine translation, just to name a few. In order to train deep learning and machine learning models, you must leverage applications such as TensorFlow, MXNet, Caffe, and XGBoost.

Wangda Tan discusses new features in Apache Hadoop 3.x to better support deep learning workloads, such as first-class GPU support, container-DNS support, scheduling improvements, and more. These improvements make running distributed deep learning and machine learning applications on YARN as simple as running them locally, which allows machine learning engineers to focus on algorithms instead of worrying about the underlying infrastructure. Wangda then demonstrates how to run these applications on YARN.

Wangda Tan

Cloudera

Wangda Tan is a product management committee (PMC) member of Apache Hadoop and engineering manager of the computation platform team at Cloudera. He manages all efforts related to Kubernetes and YARN for both on-cloud and on-premises use cases of Cloudera. His primary areas of interest are the YuniKorn scheduler (scheduling containers across YARN and Kubernetes) and the Hadoop submarine project (running a deep learning workload across YARN and Kubernetes). He’s also led features like resource scheduling, GPU isolation, node labeling, resource preemption, etc., efforts in the Hadoop YARN community. Previously, he worked on integration of OpenMPI and GraphLab with Hadoop YARN at Pivotal and participated in creating a large-scale machine learning, matrix, and statistics computation program using MapReduce and MPI and Alibaba.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com