Building and managing training datasets for ML with Snorkel





Who is this presentation for?
- ML developers, data scientists, and research scientists
Level
IntermediateDescription
One of the key bottlenecks in building ML systems is creating and managing the massive training datasets that today’s models learn from.
Alex Ratner outlines work on Snorkel, an open source framework for building and managing training datasets, and details three key operators for letting users build and manipulate training datasets: labeling functions for labeling unlabeled data, transformation functions for expressing data augmentation strategies, and slicing functions for partitioning and structuring training datasets. These operators allow domain expert users to specify ML models via noisy operators over training data, leading to applications that can be built in hours or days rather than months or years. Alex explores recent work on modeling the noise and imprecision inherent in these operators and using these approaches to train ML models that solve real-world problems, including a recent state-of-the-art result on the SuperGLUE natural language processing benchmark task.
Prerequisite knowledge
- A basic understanding of machine learning
What you'll learn
- Discover learning techniques for building, managing, and iterating on training datasets and modeling pipelines for ML in general and using the Snorkel framework

Alex Ratner
Snorkel
Alex Ratner is the project lead of Snorkel, a system for programmatically building and managing training datasets for machine learning, and (starting in 2020) an assistant professor of computer science at the University of Washington. Previously, he completed his PhD in CS advised by Christopher Ré at Stanford, where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more. At Stanford, he started and led the Snorkel project, which has been deployed at large technology companies like Google, academic labs, and government agencies and was recognized in VLDB 2018 (“Best Of”).
Presented by
Elite Sponsors
Strategic Sponsors
Diversity and Inclusion Sponsor
Impact Sponsors
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires