Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Practical techniques for interpretable machine learning

Patrick Hall (H2O.ai | George Washington University)
1:30pm5:00pm Tuesday, March 26, 2019
Secondary topics:  Ethics
Average rating: ****.
(4.00, 9 ratings)

Who is this presentation for?

  • Researchers, scientists, data analysts, predictive modelers, business users and other professionals, and anyone else who uses or consumes machine learning techniques

Level

Intermediate

Prerequisite knowledge

  • A working knowledge of Python, widely used linear modeling approaches, and machine learning algorithms

Materials or downloads needed in advance

  • A laptop with a recent version of the Firefox or Chrome browser installed (This tutorial will use a QwikLabs environment; the tutorial materials are also available on GitHub.)

What you'll learn

  • Learn several practical machine learning interpretability techniques and how to use them with Python
  • Explore best practices and common pitfalls to avoid when applying these techniques

Description

Transparency, auditability, and stability of predictive models and results are typically key differentiators in effective machine learning applications. Patrick Hall shares tips and techniques learned through implementing interpretable machine learning solutions in industries like financial services, telecom, and health insurance.

Using a set of publicly available and highly annotated examples, Patrick walks you through several holistic approaches to interpretable machine learning. The examples use the well-known University of California Irvine (UCI) credit card dataset and popular open source packages to train constrained, interpretable machine learning models and visualize, explain, and test more complex machine learning models in the context of an example credit-risk application. Along the way, Patrick draws on his applied experience to highlight crucial success factors and common pitfalls not typically discussed in blog posts and open source software documentation, such as the importance of both local and global explanation and the approximate nature of nearly all machine learning explanation techniques.

Outline:

Enhancing transparency in machine learning models with Python and XGBoost:

  • Using monotonicity constraints to train an explainable—and potentially regulator-approvable—gradient boosting machine (GBM) credit risk model
  • Using partial dependence plots and individual conditional expectation (ICE) plots to investigate the global and local mechanisms of the monotonic GBM and verify its monotonic behavior
  • Using Shapley explanations to derive reason codes for model predictions

Example Jupyter notebook

Increasing transparency and accountability in your machine learning project with Python:

  • Training a decision tree surrogate model on the original inputs and predictions of a complex GBM credit risk model to create an overall, approximate flowchart of the complex model’s predictions
  • Comparing the global variable importance from the GBM and from the surrogate decision tree and the interactions displayed in the decision tree with human domain expertise and reasonable expectations
  • Using a variant of the leave-one-covariate-out (LOCO) technique to calculate the local contribution each input variable makes toward each model prediction, to enhance local understanding of the complex GBM’s behavior and the accountability of its predictions
  • Ranking local contributions to generate regulator-mandated reason codes that describe, in plain English, the GBM’s decision process for every prediction

Example Jupyter notebook

Explaining your predictive models to business stakeholders with local interpretable model-agnostic explanations (LIME) using Python and H2O:

  • Exploring a straightforward method of creating local samples for LIME that can be more appropriate for real-time scoring of new data in production applications
  • Using LIME to understand local trends in the complex model’s predictions and calculate the local contribution of each input variable toward each model prediction
  • Sorting these contributions to create reason codes (i.e., regulator-mandated, plain English explanations of every model prediction)
  • Validating LIME results to enhance trust in generated explanations using the local model’s R2 statistic and a ranked predictions plot

Example Jupyter notebook

Debugging machine learning models for accuracy, trustworthiness, and stability with Python and H2O:

  • Exploring sensitivity analysis—perhaps the most important validation technique for increasing trust in machine learning model predictions, because machine learning model predictions can vary drastically for small changes in input variable values, especially outside of training input domains
  • Debugging a trained GBM credit risk model using residual analysis to find problems arising from overfitting and outliers

Example Jupyter notebook

Photo of Patrick Hall

Patrick Hall

H2O.ai | George Washington University

Patrick Hall is a senior director for data science products at H2O.ai, where he focuses mainly on model interpretability and model management. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Previously, Patrick held global customer-facing and R&D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera Certified Data Scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.

Comments on this page are now closed.

Comments

Picture of Patrick Hall
Patrick Hall | SENIOR DIRECTOR | ADJUNCT PROFESSOR
04/01/2019 12:19am PDT

I didn’t really use any one deck of slides, but here are the resources I shared during the tutorial.

Getting Started

• Tutorial URL: https://aquarium.h2o.ai
• Create new account
• Check email
• Use temporary password to login to aquarium
• Browse labs
• View Detail under Patrick Hall’s MLI Tutorial
• Start Lab (This can take several minutes)
• Click on the Jupyter URL when it becomes available
• Enter the token h2o
• Browse/run Jupyter notebooks
• Please End Lab when you are finished

Criticism

• Cynthia Rudin: “Please Stop Explaining Black Box Models for High Stakes Decisions”
• Cassie Kozyrkov: “Explainable AI wont deliver. Here’s why.”
• Yann Lecun, Peter Norvig, etc.

Other Resources by the Instructor
• All of the resources for this lab are freely available here: https://github.com/jphall663/interpretable_machine_learning_with_python
• The 2018 JSM presentation related to the post-hoc explanation approaches herein: https://github.com/jphall663/jsm_2018_slides
• The 2018 JSM proceedings paper related to the monotonic GBM and post-hoc explanation approaches herein: https://github.com/jphall663/jsm_2018_paper
• The 2019 H2O World presentation which puts forward an interpretable machine learning workflow: https://github.com/jphall663/h2oworld_sf_2019
• The awesome-machine-learning-interpretability metalist that includes many debugging, explanation, fairness, interpretability, privacy, and security resources: https://github.com/jphall663/awesome-machine-learning-interpretability
• A recent article on the security risks of ML models: https://www.oreilly.com/ideas/proposals-for-model-vulnerability-and-security
• Interpretable Machine Learning ``Good, Bad, and Ugly’’ slides: https://github.com/h2oai/h2o-meetups/blob/master/2018_04_30_NYC_MLI_good_bad_ugly/MLI_good_bad_ugly.pdf

Picture of Alexander Smith
Alexander Smith | SENIOR DATA SCIENTIST
04/01/2019 12:09am PDT

Hi Patrick, Thanks for the very enlightening tutorial. I will certainly use this in my work at Field Nation. Would you be willing to share your slides from your presentation? Thanks, Alex