Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management
Who is this presentation for?Machine learning engineers, architects, tech leads, developers, and DevOps engineers & C-Level executives interested in exploring how to execute data management and automation of model deployment at scale.
At Comcast, we developed a framework for operationalizing ML Models. It covers the full ML lifecycle from data ingestion, feature engineering, model training, model deployment to model evaluation at runtime. We process roughly 3 billion predictions per day using this platform. The system supports proactive (model inference based on event combinations on a stream) as well as reactive (model inference on demand).
Automating and horizontally scaling the platform to train and operationalize ML models to produce billions of predictions per day is complex. We faced several challenges including:
• Bottleneck and technology limits in the domain of data management
• Feature engineering
• Rapid model (re)training and (re)deployments.
Our metadata driven data, infrastructure, feature engineering pipelines, model training and inference pipelines support processes such as consistent feature engineering on stream for model inference, and data at rest for training/validation . We will describe how we allow feature rich raw data which contains potentially sensitive information including PII. A solution such as this must allow ML Model developers to access this information for feature engineering but still ensure that customer privacy is protected. We will describe how our framework manages this using a combination of methods such as encryption, removal, anonymization, aggregations to protect privacy without compromising model efficacy.
We will describe how we solved the “Feature Store” problem, notably, managing a historical feature store for model training and online feature store for current features to support model inference in the proactive (on event arrival) or reactive (on rest endpoint invocation) mode.
Prerequisite knowledgeA conceptual understanding of machine learning, distributed systems, containerization, DevOps, Data Management and Systems Security.
What you'll learn
Drew Leamon started his career at Microsoft while studying Computer Science at Princeton University. In his studies, he delved into Computer Graphics, Artificial Intelligence and Computational Neurobiology. At Microsoft, he collaborated with Microsoft Research on one of the first commercial implementations of collaborative filtering for e-commerce. This was released as Microsoft Site Server: Commerce Edition.
Graduating into the DotCom boom, Drew caught the entrepreneurial spirit of the time and went on to sell cars on the internet through CarOrder.com, a Trilogy Software spin-off. While there, he created new ways to sell content online through innovative configuration solutions. Next Drew became one of the charter members of AirClic where he helped to create a platform to support workforce automation using wireless technologies. Drew’s work and IP in this space became core to the company’s business value. At Traffic.com / Navteq / Nokia, Drew pioneered the visualization of traffic data collected from highway sensors and digital probe devices.
Moving on to Comcast, Drew has taken his diverse experience and background and now leads part of the Engineering Analysis organization. His team is developing advanced data visualizations for network data. They are building elastically scaling Big Data infrastructure to support Analytic workloads. Simulations of Comcast’s CDNs and platforms, developed by Drew’s team, are leveraging this platform and guiding the business and engineering teams. His team is identifying high ROI opportunities. They apply machine learning to datasets and are currently operationalizing the resulting predictive models to help improve customer experience.
Sameer Wadkar is a senior principal architect for machine learning at Comcast NBCUniversal, where he works on operationalizing machine learning models to enable rapid turnaround times from model development to model deployment and oversees data ingestion from data lakes, streaming data transformations, and model deployment in hybrid environments ranging from on-premises deployments to cloud and edge devices. Previously, he developed big data systems capable of handling billions of financial transactions per day arriving out of order for market reconstruction to conduct surveillance of trading activity across multiple markets and implemented natural language processing (NLP) and computer vision-based systems for various public and private sector clients. He is the author of Pro Apache Hadoop and blogs about data architectures and big data.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts