Mar 15–18, 2020

Model debugging strategies

Patrick Hall (H2O.ai | George Washington University)
1:30pm5:00pm Monday, March 16, 2020
Location: LL21 E/F

Who is this presentation for?

Data scientists or analysts

Level

Intermediate

Description

You used cross-validation, early stopping, grid search, monotonicity constraints, and regularization to train a generalizable, interpretable, and stable model. Its lift, area under the curve (AUC), and other fit statistics look just fine on out-of-time test data, and better than the linear model it’s replacing. You selected your cutoff judiciously and even used automatic code generation to create a real-time scoring engine. So, it’s time to deploy.

No. Unfortunately, current best practices for machine learning (ML) model training and assessment can be insufficient for high-stakes, real-world ML systems. Much like other complex information technology systems, ML models need to be debugged for logical or run-time errors and for security vulnerabilities. Recent high-profile failures have made it clear that ML models must also be debugged for disparate impact across demographic segments and other types of unwanted sociological bias.

Patrick Hall breaks down model debugging and systematic debugging and remediation strategies for ML. Model debugging is an emergent discipline focused on discovering and remediating errors in the internal mechanisms and outputs of ML models. Model debugging attempts to test ML models like code (because they are usually code). Model debugging enhances trust in ML directly by increasing accuracy in new or holdout data, by decreasing or identifying hackable attack surfaces, or by decreasing sociological bias. As a side effect, model debugging should also increase understanding and interpretability of model mechanisms and predictions.

Outline:

Debugging strategies

  • Sensitivity analysis and variants: Out-of-range and residual partial dependence, individual conditional expectation, adversarial examples, and random attacks
  • Residual analysis and variants: Disparate impact and error analysis and post hoc explanation of residuals
  • Benchmark models
  • White hat hacks on ML

Remediation strategies

  • Anomaly detection
  • Model assertions
  • Model editing
  • Model monitoring
  • Noise injection
  • Strong regularization

Want a sneak peak of the strategies? Check out these open resources.

Prerequisite knowledge

  • A working knowledge of tree-based ensemble models, linear models, and Python

Materials or downloads needed in advance

  • A laptop and an email address (This is hosted in the H2O educational cloud, Aquarium; materials are kept open for review and suggestions on GitHub.)

What you'll learn

  • Learn strategies to test and fix security vulnerabilities, unwanted sociological biases, and hidden errors in your ML systems
Photo of Patrick Hall

Patrick Hall

H2O.ai | George Washington University

Patrick Hall is a senior director for data science products at H2O.ai, where he focuses mainly on model interpretability and model management. Patrick is also an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Previously, Patrick held global customer-facing and R&D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the eleventh person worldwide to become a Cloudera Certified Data Scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires