Presented By O’Reilly and Intel AI
Put AI to Work
April 29-30, 2018: Training
April 30-May 2, 2018: Tutorials & Conference
New York, NY

Model evaluation in the land of deep learning

Pramit Choudhary (
4:00pm–4:40pm Wednesday, May 2, 2018
Implementing AI, Interacting with AI
Location: Nassau East/West
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data scientists, machine learning practitioners, and product managers involved with analytical or predictive modeling workflows

Prerequisite knowledge

  • A basic understanding of machine learning concepts and deep neural networks

What you'll learn

  • Understand why evaluating models using model metrics like RMSE or the confusion matrix is not enough
  • Learn tricks and algorithms to enable interpretability in image classification problems


Model evaluation metrics are typically tied to the predictive learning tasks. There are different metrics for classification (ROC-AUC, confusion matrix), regression (RMSE, R2 score), ranking metrics (precision recall, F1 score), and so on. These metrics, coupled with cross-validation or hold-out validation techniques, might help analysts and data scientists select a performant model. However, model performance decays over time because of the variability in the data. At this point in time, point estimate-based metrics are not enough, and a better understanding of the why, what, and how of the categorization process is needed.

Evaluating model decisions might still be easy for linear models but gets difficult in the world of deep neural networks (DNNs). This complexity might increase multifold for use cases related to computer vision (image classification, image captioning or visual QnA(VQA), text classification), sentiment analysis, or topic modeling. ResNets, a recently published state-of-the-art DNN, has over 200 layers. Interpreting input features and output categorization over multiple layers is challenging. The lack of decomposability and intuitiveness associated with DNNs prevents widespread adoption even with their superior performance compared to more classical machine learning approaches. Faithful interpretation of DNNs will help not only provide insight about the failure modes (false positives and false negatives) but also enable the humans in the loop to evaluate the robustness of the model against noise. This brings in trust and transparency to the predictive algorithm.

Pramit Choudhary shares tricks to enable class-discriminative visualizations for computer vision problems when using convolutional neural networks (CNNs) and approaches to help enable transparency of CNNs by capturing metrics during the validation step and highlighting salient features in the image which are driving prediction.

Photo of Pramit Choudhary

Pramit Choudhary

Pramit Choudhary is a Lead data scientist/ML scientist at, where he focuses on optimizing and applying classical machine learning and Bayesian design strategy to solve large scale real-world problems.
Currently, he is leading initiatives on figuring out better ways to generate a predictive model’s learned decision policies as meaningful insights(Supervised/Unsupervised problems)