Clue: Evaluate the impact of your new training pipeline on existing models in production





Who is this presentation for?
- Machine learning engineers, data scientists, and product managers
Level
BeginnerDescription
Bruno Wassermann details a tool called Clue that IBM Research is building to help machine learning engineers evaluate changes to complex machine learning training pipelines before deploying them to production.
The pipeline that implements the natural language understanding (NLU) layer of the IBM Watson Assistant service motivated IBM Research to work on the Clue tool. This training pipeline builds custom NLU models for chatbots from customer data, and those working on the IBM Watson Assistant service change this training pipeline on a regular basis to incorporate bug fixes and ideas for improvements. The challenge is that you have a large number of existing customer models running in production, these models differ in human language, domain, use case, and so on, and you have to ensure that your latest training pipeline does not negatively impact their accuracy and other runtime characteristics.
Clue to the rescue. Clue records metadata and results from large-scale tests executed against customer data and implements a number of features to analyze those results. It offers visualizations and the ability to query the results to examine them from different angles as well as implements a set of automated analyses that attempt to identify issues that should probably be investigated further before deploying to production.
Bruno demonstrates Clue’s features, including its dashboards and visualizations, what kinds of queries you can run, and how Clue helps to determine noteworthy results through statistical significance tests, trend analysis, anomaly detection, and the evaluation of promotion policies.
Prerequisite knowledge
- A basic understanding of machine learning development, training, and inference
What you'll learn
- Discover the challenge of making sure new versions of a machine learning training pipeline do not negatively affect existing customer models that are already in production
- Understand IBM Research's (evolving) approach to gaining confidence in new training pipelines before deploying it to production

Bruno Wassermann
IBM Research
Bruno Wassermann is a research staff member at IBM Research – Haifa, where he’s worked on parts of the distributed systems infrastructure of Watson Developer Cloud, is trying to help SREs make better sense of monitoring and log data, and, more recently, has begun working on some of the issues that arise from the productionization of machine learning applications.
Presented by
Elite Sponsors
Strategic Sponsor
Exabyte Sponsor
Impact Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
aisponsorships@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires