DevOps and QA issues usually consist of large files with numerous log events and configuration data. Resolving such issues is sometimes like finding a needle in a haystack. It requires the execution of highly technical procedures by experienced DevOps and QA engineers. When the issue is a duplication of an existing issue (for example, an additional failure of a certain test), this tedious investigation process may be avoided. However, identifying such duplications is a complicated task that depends on the investigator’s familiarity with past issues, knowledge sharing within the team, and thorough investigation of candidate issues. Software tracking systems (like JIRA or Bugzilla) typically enable textual querying for locating items of interest (log content, system documentations, configuration properties, labels, etc.). These search tools assume that data is of high quality and that user descriptions are semantically accurate. In reality, these conditions are not met and the investigation becomes a frustrating and time consuming task.
Ran Taig and Omer Sagi outline a solution that leverages NLP and machine learning algorithms to automatically identify duplicate issues. The solution creates a “fingerprint” vector representation for each issue and stores each ‘fingerprint’ in a designated knowledge base. When a new issue arrives, its configuration and log files are processed through a pipeline that converts them into a new fingerprint. Recommendations for potential duplicates can be drawn using the designated knowledge base and a machine learning algorithm that aims to find similar fingerprints. Ran and Omer describe the successful implementation of this solution in Dell’s production systems, which has led to a significant reduction in the resolution time for issues.
Ran Taig is a senior data scientist at Dell, responsible for both the business and the scientific aspects of the data science lifecycle. A machine learning practitioner with a strong academic background, Ran is also an experienced lecturer who has delivered core CS courses to undergrads at Ben-Gurion University. He is fluent in the common data science toolbox (Python, pandas, SQL, Spark, etc.). Ran holds a PhD in artificial intelligence.
Omer Sagi is a senior data scientist on the data science team at Dell, where he leads several data science projects in the fields of precision agriculture, online marketing, failure prediction, and text classification. Omer has also taught courses on Java programing and databases. He holds a master’s degree from the Department of Industrial Engineering at Ben-Gurion University; his thesis presented a novel approach for assessing the monetary damages of data loss incidents. Omar is currently a PhD candidate in the Department of Software and Information Systems Engineering at Ben-Gurion University, focusing on developing algorithms that simplify ensemble models.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com