Clue: Evaluate the impact of your new training pipeline on existing models in production
Who is this presentation for?
- Machine learning engineers, data scientists, and product managers
Bruno Wassermann details a tool called Clue that IBM Research is building to help machine learning engineers evaluate changes to complex machine learning training pipelines before deploying them to production.
The pipeline that implements the natural language understanding (NLU) layer of the IBM Watson Assistant service motivated IBM Research to work on the Clue tool. This training pipeline builds custom NLU models for chatbots from customer data, and those working on the IBM Watson Assistant service change this training pipeline on a regular basis to incorporate bug fixes and ideas for improvements. The challenge is that you have a large number of existing customer models running in production, these models differ in human language, domain, use case, and so on, and you have to ensure that your latest training pipeline does not negatively impact their accuracy and other runtime characteristics.
Clue to the rescue. Clue records metadata and results from large-scale tests executed against customer data and implements a number of features to analyze those results. It offers visualizations and the ability to query the results to examine them from different angles as well as implements a set of automated analyses that attempt to identify issues that should probably be investigated further before deploying to production.
Bruno demonstrates Clue’s features, including its dashboards and visualizations, what kinds of queries you can run, and how Clue helps to determine noteworthy results through statistical significance tests, trend analysis, anomaly detection, and the evaluation of promotion policies.
- A basic understanding of machine learning development, training, and inference
What you'll learn
- Discover the challenge of making sure new versions of a machine learning training pipeline do not negatively affect existing customer models that are already in production
- Understand IBM Research's (evolving) approach to gaining confidence in new training pipelines before deploying it to production
Bruno Wassermann is a research staff member at IBM Research – Haifa, where he’s worked on parts of the distributed systems infrastructure of Watson Developer Cloud, is trying to help SREs make better sense of monitoring and log data, and, more recently, has begun working on some of the issues that arise from the productionization of machine learning applications.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires