Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Improving DevOps and QA efficiency using Machine Learning and NLP methods

Ran Taig (Dell), Omer Sagi (Dell)
16:3517:15 Wednesday, 23 May 2018

Who is this presentation for?

Data Scientists, Developers, Managers

Prerequisite knowledge

Basic familiarity with Machine Learning and NLP

What you'll learn

QA and DevOps efficiency can be improved using Machine Learning algorithms and NLP methods.


DevOps and QA issues usually consist of large files with numerous log events and configuration data. Resolving such issues is sometimes like finding a needle in a haystack. It requires the execution of highly technical procedures by experienced DevOps and QA engineers. When the issue is a duplication of an existing issue (for example, additional failure of a certain test), this tedious investigation process may be avoided. However, identifying such duplications is a complicated task that depends on the investigator’s familiarity with past issues, knowledge sharing within the team, and thorough investigation of candidate issues. Software tracking systems (like JIRA or Bugzilla) typically enable textual querying for locating items of interest (e.g., log content, system documentations, configuration properties, labels etc.). These search tools assume that data is of high quality and that user descriptions are semantically accurate. In reality, these conditions are not met and the investigation becomes a frustrating and time consuming task.
We present a solution that automatically identifies duplicates of a given issue. Using NLP methods, we create a ‘fingerprint’ vector representation for each issue and store each ‘fingerprint’ in a designated knowledge base. When a new issue arrives, its configuration and log files are processed through a pipeline that converts them into a new ‘fingerprint’. Recommendations of potential duplicates can be drawn using the designated knowledge base and a machine learning algorithm that aims to find similar ‘fingerprints’. The session presents the successful implementation of this solution at Dell’s production systems, leading to a significant reduction of issues’ resolution time.

Photo of Ran Taig

Ran Taig


Machine learning practitioner with a strong academic background (PhD) in Artificial Intelligence. Experienced Lecturer with a demonstrated success of delivering core CS courses to undergrads in Ben-Gurion University. By combining both these expertise I work today as senior data scientist at Dell with broad view of both the business and the scientific aspects of the data science life-cycle. Independent learner and researcher. Fluent with the common data science toolbox (Python, Pandas, SQL, Spark, etc.).

Photo of Omer Sagi

Omer Sagi


Omer Sagi

Senior Data Scientist at the data science team, Dell IT.
Mr. Sagi is a seasoned data scientist at Dell and currently a PhD candidate in the department of software and information systems engineering at Ben-Gurion University.
In Dell, Omer lead several data science projects in the fields of percision agriculture, online marketing, failure prediction, text classification and more. In his PhD, Omer’s research aims at developing algorithms that simplify ensemble models. His Master’s thesis in the department of industrial engineering at Ben-Gurion University presented a novel approach for assessing the monetary damages of data loss incidents. Omer also taught several courses such as Java programing and Databases.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)