San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Automation of root cause analysis for big data stack applications

Alkis Simitsis (Micro Focus), Shivnath Babu (Unravel Data Systems | Duke University)

11:50am–12:30pm Thursday, March 28, 2019

Data Engineering & Architecture
Location: 2024

Secondary topics: Automation in data science and big data, Deep Learning

Average rating:

(2.67, 3 ratings)

Who is this presentation for?

Data engineers, data scientists, and those in data operations

Level

Advanced

Prerequisite knowledge

Familiarity with Hadoop and Spark

What you'll learn

Learn how to easily identify the root cause for application failures with an automated technique that uses deep learning techniques

Description

In multiple distributed systems, applications can fail due to many reasons, such as out-of-memory or due to a timeout while waiting for some resource. Or the root cause may be deeper. For example, a timeout may be due to an application getting delayed because it accesses tables containing small files or non-splittable files and thus, accessing data on them is particularly slow. However the reason might be, when an application fails, users are required to fix the cause of the failure to get the application running successfully. Since applications may interact with multiple components, a failed application can generate a large set of raw logs. These logs typically contain thousands of messages, including errors and stacktraces. Hunting for the root cause of an application failure from these messy, raw, and distributed logs is hard for experts, and a nightmare for the thousands of new users coming to the big data stack.

Alkis Simitsis and Shivnath Babu share an automated technique for root cause analysis (RCA) for big data stack applications using deep learning techniques, using Spark and Impala. They begin by describing how to automatically generate insights into a failed application in a multiengine big data stack before detailing their approach to automatically identify the root cause of application failure, which consists of continuous log collection of Spark and Impala application failures and an automatic labeling mechanism using unsupervised learning; converting logs into feature vectors using a three-layer neural network; and learning a predictive model for RCA from these feature vectors using deep learning and active learning techniques. They conclude by discussing algorithms for automatic fixes for failed applications that use examples of successful and failed runs of the application or similar applications from history. They’ll then try out a limited number of alternative configurations to get the application quickly to a running state and walk you through getting the application to a resource-efficient running state.

Alkis Simitsis

Micro Focus

Alkis Simitsis is a chief scientist for cybersecurity analytics at Micro Focus. Alkis has more than 15 years of experience building innovative information and data management solutions in areas like real-time business intelligence, security, massively parallel processing, systems optimization, data warehousing, graph processing, and web services. He holds 26 US patents and has filed over 50 patent applications in the US and worldwide. He’s published more than 100 papers in refereed international journals and conferences (top publications cited 5,000+ times) and frequently serves in various roles in program committees of top-tier international scientific conferences. He’s also an IEEE senior member and a member of the ACM.

Shivnath Babu

Unravel Data Systems | Duke University

Shivnath Babu is the CTO at Unravel Data Systems and an adjunct professor of computer science at Duke University. His research focuses on ease of use and manageability of data-intensive systems, automated problem diagnosis, and cluster sizing for applications running on cloud platforms. Shivnath cofounded Unravel to solve the application management challenges that companies face when they adopt systems like Hadoop and Spark. Unravel originated from the Starfish platform built at Duke, which has been downloaded by over 100 companies. Shivnath has won a US National Science Foundation CAREER Award, three IBM Faculty Awards, and an HP Labs Innovation Research Award.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com