Sep 23–26, 2019
Please log in

Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management

Mumin Ransom (Comcast), Nick Pinckernell (Comcast)
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 12/14
Average rating: ****.
(4.33, 6 ratings)

Who is this presentation for?

  • Machine learning engineers, architects, tech leads, developers, and DevOps engineers and C-level executives interested in exploring how to execute data management and automation of model deployment at scale

Level

Intermediate

Description

Comcast developed a framework for operationalizing ML models. It covers the full ML lifecycle from data ingestion, feature engineering, model training, and model deployment to model evaluation at runtime. It processes roughly 3 billion predictions per day. The system supports proactive (model inference based on event combinations on a stream) as well as reactive (model inference on demand).

Mumin Ransom explores how Comcast solved the “feature store” problem, notably, managing a historical feature store for model training and online feature store for current features to support model inference in the proactive (on event arrival) or reactive (on rest endpoint invocation) mode.

Automating and horizontally scaling the platform to train and operationalize ML models to produce billions of predictions per day is complex. Several of the challenges Comcast faced included bottleneck and technology limits in the domain of data management, feature engineering, and rapid model (re)training and (re)deployments.

The metadata-driven data, infrastructure, feature-engineering pipelines, model training, and inference pipelines support processes such as consistent feature engineering on stream for model inference and data at rest for training and validation. Mumin details how Comcast allows feature-rich raw data, which contains potentially sensitive information including PII. A solution such as this must allow ML model developers to access this information for feature engineering but still ensure that customer privacy is protected. Mumin also outlines how the framework manages this using a combination of methods such as encryption, removal, anonymization, and aggregations to protect privacy without compromising model efficacy.

Prerequisite knowledge

  • A basic understanding of machine learning, distributed systems, containerization, DevOps, data management, and systems security

What you'll learn

  • Learn how to use DevOps principles and methodologies to integrate machine learning models into a data processing pipeline at scale
  • Explore the challenges involved in doing consistent feature engineering at scale across model training and model inferencing
  • Discover how to use sensitive data for model training while ensuring customer privacy
Photo of Mumin Ransom

Mumin Ransom

Comcast

Mumin Ransom joined Comcast in 2005. Since then he has worked across worked across the HSD, Voice and Video Production lines. Mumin currently leads a machine learning platform development team . His system handles billions of events daily. The system is designed to improve customer experience by predicting service issues and meeting customers digitally with simple resolutions. This results in less down time for customers making them happier and save millions in operations cost.

Mumin also is co-founder of BENgineers a black technology professionals organization at Comcast. Their goal is to enhance the tech pipeline and create advocacy and representation for black tech professionals at Comcast. The BENgineers participated in coding events with local children, hosted discussions about blacks in technology, participated in Comcast Lab weeks and have received a Commerce Impact Award for entrepreneurship and strategy for the creation of the organization.

Photo of Nick Pinckernell

Nick Pinckernell

Comcast

Nick Pinckernell is a senior research engineer for the applied AI research team at Comcast, where he works on ML platforms for model serving and feature pipelining. He’s focused on software development, big data, distributed computing, and research in telecommunications for many years. He’s pursuing his MS in computer science at the University of Illinois at Urbana-Champaign, and when free, he enjoys IoT.

Comments on this page are now closed.

Comments

Picture of Nick Pinckernell
Nick Pinckernell | Senior Research Engineer
09/30/2019 6:42pm EDT

Sure! The deck is available at https://conferences.oreilly.com/strata/strata-ny/user/proposal/presentation_download/77284?pres_id=9221

Prasad Paravatha | Principal Data Engineer
09/26/2019 3:55pm EDT

Thanks for the great presentation.
Can you please post the presentation deck here?

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires