Production ML outside the black box: From repeatable inputs to explainable outputs
Who is this presentation for?Data engineers, data architects, developers
As the machine learning ecosystem has evolved, the libraries and tools for training, evaluating, and optimizing complex models have become more prevalent and easier to use. However, this alone isn’t sufficient for deploying machine learning in critical production applications that affect hundreds of thousands of businesses. Kelley Rivoire dissects challenges Stripe faced in developing reliable, accurate, and performant machine learning applications to support products like Stripe Radar and Stripe Capital, as well as the tools and technology the company built to help solve them.
One area of challenge is engineering the inputs to models that are used as features from bugs in features that aren’t defined the same way (or even in the same language) in training versus production scoring to mistakes in recording when a feature or label was actually available that can lead to unrealistically predictive models. You’ll explore the system built at Stripe leveraging event-ed data to enable model developers to quickly define (and test) complex and highly predictive features in a single place in code and make them available for both training and real-time scoring, eliminating some of these common classes of feature generation errors.
Another challenge in developing modeling applications is in helping humans understand the output of a black box model—this is necessary for model developers to debug problems like bad inputs and for end users of the model to understand why a score or decision was made. Kelley walks you through a few strategies Stripe developed and implemented for sharing context about the inputs to a model as well as explanations for a decision and the trade-offs of each.
- Experience with machine learning and deploying machine learning applications
What you'll learn
- Discover the difference in needs between training a model for research or analysis and developing reliable, accurate, and performant production machine learning applications
- Identify common gotchas in designing and implementing features to serve as inputs for machine learning models
- Learn how to think about getting insight from and explaining the decisions of a complex black box model for both model developers and the end users of a model, including a few strategies and the pluses and minuses of each
Kelley Rivoire is the head of data infrastructure at Stripe, where she leads the Data Infrastructure Group. As an engineer, she built Stripe’s first real-time machine learning evaluation of user risk. Previously, she worked on nanophotonics and 3D imaging as a researcher at HP Labs. She holds a PhD from Stanford.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires