Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Classifying job execution using deep learning

Ash Munshi (Pepperdata)

3:30pm–4:10pm Thursday, 09/13/2018

Data science and machine learning
Location: 1A 15/16 Level: Advanced

Secondary topics: Deep Learning

Who is this presentation for?

Data engineers, architects, software developers, and cluster operators

Prerequisite knowledge

A basic understanding of big data
Familiarity with using machine learning to optimize big data performance

What you'll learn

Explore techniques for labeling applications using runtime measurements of CPU, memory, network I/O, and a deep neural network to better predict runtimes, tune resource utilization, and increase efficiency

Description

Operators of big data clusters face the problem of understanding the types of applications that run on these clusters in order to better predict runtimes, tune resource utilization, and increase efficiency. Unfortunately, application developers seldom provide meaningful information to accomplish this task. They may provide a descriptive name but leave the rest for the operators to discern.

Ash Munshi outlines a technique for labeling applications using runtime measurements of CPU, memory, and network I/O along with a deep neural network. This labeling groups the applications into buckets that have understandable characteristics, which can then be used to reason about the cluster and its performance. For example, members of a single group can be studied to understand variability in runtime, effects of different queue assignments, effects of the underlying system hardware architecture, and even the effects of start times for periodic applications.

The machine learning techniques presented are new and represent the first approach to classify multivariate time series. The data for the models comes from observing over 22,000 servers and all of their task metrics every five seconds for months.

Ash Munshi

Pepperdata

Ash Munshi is CEO of Pepperdata. Previously, Ash was executive chairman for deep learning startup Marianas Labs (acquired by Askin in 2015); CEO of big data storage startup Graphite Systems (acquired by EMC DSSD in 2015); CTO of Yahoo; and CEO of a number of other public and private companies. He serves on the board of several technology startups.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com