Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management
Who is this presentation for?
- Machine learning engineers, architects, tech leads, developers, and DevOps engineers and C-level executives interested in exploring how to execute data management and automation of model deployment at scale
Level
Description
Comcast developed a framework for operationalizing ML models. It covers the full ML lifecycle from data ingestion, feature engineering, model training, and model deployment to model evaluation at runtime. It processes roughly 3 billion predictions per day. The system supports proactive (model inference based on event combinations on a stream) as well as reactive (model inference on demand).
Mumin Ransom explores how Comcast solved the “feature store” problem, notably, managing a historical feature store for model training and online feature store for current features to support model inference in the proactive (on event arrival) or reactive (on rest endpoint invocation) mode.
Automating and horizontally scaling the platform to train and operationalize ML models to produce billions of predictions per day is complex. Several of the challenges Comcast faced included bottleneck and technology limits in the domain of data management, feature engineering, and rapid model (re)training and (re)deployments.
The metadata-driven data, infrastructure, feature-engineering pipelines, model training, and inference pipelines support processes such as consistent feature engineering on stream for model inference and data at rest for training and validation. Mumin details how Comcast allows feature-rich raw data, which contains potentially sensitive information including PII. A solution such as this must allow ML model developers to access this information for feature engineering but still ensure that customer privacy is protected. Mumin also outlines how the framework manages this using a combination of methods such as encryption, removal, anonymization, and aggregations to protect privacy without compromising model efficacy.
Prerequisite knowledge
- A basic understanding of machine learning, distributed systems, containerization, DevOps, data management, and systems security
What you'll learn
- Learn how to use DevOps principles and methodologies to integrate machine learning models into a data processing pipeline at scale
- Explore the challenges involved in doing consistent feature engineering at scale across model training and model inferencing
- Discover how to use sensitive data for model training while ensuring customer privacy
Mumin Ransom
Comcast
Mumin Ransom joined Comcast in 2005. Since then he has worked across worked across the HSD, Voice and Video Production lines. Mumin currently leads a machine learning platform development team . His system handles billions of events daily. The system is designed to improve customer experience by predicting service issues and meeting customers digitally with simple resolutions. This results in less down time for customers making them happier and save millions in operations cost.
Mumin also is co-founder of BENgineers a black technology professionals organization at Comcast. Their goal is to enhance the tech pipeline and create advocacy and representation for black tech professionals at Comcast. The BENgineers participated in coding events with local children, hosted discussions about blacks in technology, participated in Comcast Lab weeks and have received a Commerce Impact Award for entrepreneurship and strategy for the creation of the organization.
Nick Pinckernell
Comcast
Nick Pinckernell is a senior research engineer for the applied AI research team at Comcast, where he works on ML platforms for model serving and feature pipelining. He’s focused on software development, big data, distributed computing, and research in telecommunications for many years. He’s pursuing his MS in computer science at the University of Illinois at Urbana-Champaign, and when free, he enjoys IoT.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Sure! The deck is available at https://conferences.oreilly.com/strata/strata-ny/user/proposal/presentation_download/77284?pres_id=9221
Thanks for the great presentation.
Can you please post the presentation deck here?