September 26-27, 2016
New York, NY

Growing up: Continuous integration for machine-learning models

Zachary Hanif (Capital One)
1:30pm–2:10pm Monday, 09/26/2016
Implementing AI
Location: 3D10
Average rating: ***..
(3.50, 4 ratings)

What you'll learn

  • Explore how Capital One adapted CI tools and practices to solve model governance and accuracy tracking concerns in a complex environment with adversarial and temporal data complications
  • Description

    Machine learning is growing more and more critical to many fields. From energy to finance to security, the ability to enrich, clarify, and associate large amounts of data accurately has provided measurable productivity increases and a level of insight previously absent. However, with the gains come increased developmental and operational difficulties, mostly surrounding the deployments of trained models, the automated repeatability of model generation, and historical tracking of model accuracy across windowed datasets, feature vectors, and model architectures. Right now, the engineering and tooling around machine learning in production environments is not widely discussed. Software engineers, however, have been dealing with sibling concerns for years and have mature tooling around it.

    Zachary Hanif explores how to take advantage of that tooling and the associated lessons learned to improve the governance of models over and across time and the natural evolution of the understanding of the problem space. Zachary explains how to deal with three major concerns unique to the data science field though the use of modified CI tools
: temporal drift (we need to measure, easily, how our models drift in accuracy over time)
, context loss (model and feature evolution over time must be captured in a format that is resilient to information loss), and
 model reproducibility (historical sets of results, datasets, and trials must be retained and reproducible). The Cyber Security Machine Learning group at Capital One has directly encountered these concerns throughout its work. Complicating the matter is the fact that cybersecurity data is inherently adversarial and naturally drifts over time due to the evolution of network participants and usage. Zachary offers an overview of Model Monitor, a system developed by the Cyber Security Machine Learning group to provide automated model governance capabilities, describes system’s architecture and the lessons learned while working with this system, and gives a live demo.

    This session is sponsored by Capital One.

    Photo of Zachary Hanif

    Zachary Hanif

    Capital One

    Zachary Hanif is a director in Capital One’s Center for Machine Learning, where he leads teams focused on applying machine learning to cybersecurity and financial crime. His research interests include applications of machine learning and graph mining within the realm of massive security data and the automation of model validation and governance. Zachary graduated from the Georgia Institute of Technology.