Presented By O’Reilly and Intel AI
Put AI to Work
April 29-30, 2018: Training
April 30-May 2, 2018: Tutorials & Conference
New York, NY

Using NLP, neural networks, and reporting metrics in production for continuous improvement in text classifications

Megan Yetman (Capital One)
4:00pm–4:40pm Tuesday, May 1, 2018
Models and Methods
Location: Nassau East/West
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Modelers who want to learn more about NLP, production use cases, and model interpretation

Prerequisite knowledge

  • A working knowledge of NLP and neural networks

What you'll learn

  • Explore Pensieve, a natural language processing (NLP) project that classifies reviews
  • Learn ways to improve model reporting and the ability for continuous model learning and improvement


Pensieve—a natural language processing (NLP) project that classifies reviews for their sentiment, reason for sentiment, high-level content, and low-level content—is used in production to handle thousands of reviews daily and across multiple domains. Megan Yetman offers an overview of Pensieve as well as ways to improve model reporting and the ability for continuous model learning and improvement.

Raw text is input and transformed using a custom tokenized vocabulary set. The output is then sent through an embedding layer, a convolutional neural network (CNN), and a bidirectional long short-term memory network (bi-LSTM) to produce softmax outputs on the classification options. Monte Carlo simulations are then run, generating multiple softmax outputs per classification per review. Nonparametric tests are also performed to determine which outputs to report on. This enables optimization on accuracy by balancing model coverage.

Additionally, Pensieve has self-training capabilities. If review classifications are validated by a human, they are used to further train the model. If the new model weights pass an added layer of tests, the model is updated, increasing the scope and accuracy of the classifications. Fail scenarios are also in place to account for poor data as well as if the model stops performing as expected.

Photo of Megan Yetman

Megan Yetman

Capital One

Megan Yetman is a machine learning engineer at the Center for Machine Learning at Capital One. Megan has production experience with natural language processing and neural networks as well as data migration and data science. She holds a BA and MS in statistics from the University of Virginia.

Comments on this page are now closed.


05/04/2018 6:40am EDT

Hi, @Megan,
Will the slides be made available? Great talk! Thank you!