Clue: Evaluate the impact of your new training pipeline on existing models in production

Bruno Wassermann (IBM Research)

16:50–17:30 Wednesday, 16 October 2019

Location: King's Suite - Sandringham

Implementing AI

Secondary topics: Machine Learning, Machine Learning tools

Average rating:

(4.00, 2 ratings)

Who is this presentation for?

Machine learning engineers, data scientists, and product managers

Level

Beginner

Description

Bruno Wassermann details a tool called Clue that IBM Research is building to help machine learning engineers evaluate changes to complex machine learning training pipelines before deploying them to production.

The pipeline that implements the natural language understanding (NLU) layer of the IBM Watson Assistant service motivated IBM Research to work on the Clue tool. This training pipeline builds custom NLU models for chatbots from customer data, and those working on the IBM Watson Assistant service change this training pipeline on a regular basis to incorporate bug fixes and ideas for improvements. The challenge is that you have a large number of existing customer models running in production, these models differ in human language, domain, use case, and so on, and you have to ensure that your latest training pipeline does not negatively impact their accuracy and other runtime characteristics.

Clue to the rescue. Clue records metadata and results from large-scale tests executed against customer data and implements a number of features to analyze those results. It offers visualizations and the ability to query the results to examine them from different angles as well as implements a set of automated analyses that attempt to identify issues that should probably be investigated further before deploying to production.

Bruno demonstrates Clue’s features, including its dashboards and visualizations, what kinds of queries you can run, and how Clue helps to determine noteworthy results through statistical significance tests, trend analysis, anomaly detection, and the evaluation of promotion policies.

Prerequisite knowledge

A basic understanding of machine learning development, training, and inference

What you'll learn

Discover the challenge of making sure new versions of a machine learning training pipeline do not negatively affect existing customer models that are already in production
Understand IBM Research's (evolving) approach to gaining confidence in new training pipelines before deploying it to production

Bruno Wassermann

IBM Research

Bruno Wassermann is a research staff member at IBM Research – Haifa, where he’s worked on parts of the distributed systems infrastructure of Watson Developer Cloud, is trying to help SREs make better sense of monitoring and log data, and, more recently, has begun working on some of the issues that arise from the productionization of machine learning applications.