Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Improving DevOps and QA efficiency using machine learning and NLP methods

Ran Taig (Dell), Omer Sagi (Dell)

16:35–17:15 Wednesday, 23 May 2018

Data engineering and architecture, Data-driven business management, Streaming systems and real-time applications
Location: S11B Level: Intermediate

Secondary topics: Text and Language processing and analysis

Average rating:

(2.00, 1 rating)

Download slides (PPTX)

Who is this presentation for?

Data scientists, developers, and managers

Prerequisite knowledge

Basic familiarity with machine learning and NLP

What you'll learn

Explore an approach that leverages NLP and machine learning algorithms to automatically identify duplicate issues, improving QA and DevOps efficiency

Description

DevOps and QA issues usually consist of large files with numerous log events and configuration data. Resolving such issues is sometimes like finding a needle in a haystack. It requires the execution of highly technical procedures by experienced DevOps and QA engineers. When the issue is a duplication of an existing issue (for example, an additional failure of a certain test), this tedious investigation process may be avoided. However, identifying such duplications is a complicated task that depends on the investigator’s familiarity with past issues, knowledge sharing within the team, and thorough investigation of candidate issues. Software tracking systems (like JIRA or Bugzilla) typically enable textual querying for locating items of interest (log content, system documentations, configuration properties, labels, etc.). These search tools assume that data is of high quality and that user descriptions are semantically accurate. In reality, these conditions are not met and the investigation becomes a frustrating and time consuming task.

Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. The solution creates a “fingerprint” vector representation for each issue and stores each ‘fingerprint’ in a designated knowledge base. When a new issue arrives, its configuration and log files are processed through a pipeline that converts them into a new fingerprint. Recommendations for potential duplicates can be drawn using the designated knowledge base and a machine learning algorithm that aims to find similar fingerprints. Ran and Omer describe the successful implementation of this solution in Dell’s production systems, which has led to a significant reduction in the resolution time for issues.

Ran Taig

Dell

Ran Taig is a senior data scientist at Dell, responsible for both the business and the scientific aspects of the data science lifecycle. A machine learning practitioner with a strong academic background, Ran is also an experienced lecturer who has delivered core CS courses to undergrads at Ben-Gurion University. He is fluent in the common data science toolbox (Python, pandas, SQL, Spark, etc.). Ran holds a PhD in artificial intelligence.

Omer Sagi

Dell

Omer Sagi is a senior data scientist on the data science team at Dell, where he leads several data science projects in the fields of precision agriculture, online marketing, failure prediction, and text classification. Omer has also taught courses on Java programing and databases. He holds a master’s degree from the Department of Industrial Engineering at Ben-Gurion University; his thesis presented a novel approach for assessing the monetary damages of data loss incidents. Omar is currently a PhD candidate in the Department of Software and Information Systems Engineering at Ben-Gurion University, focusing on developing algorithms that simplify ensemble models.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com