Building and managing training datasets for ML with Snorkel
Who is this presentation for?
- ML developers, data scientists, and research scientists
One of the key bottlenecks in building ML systems is creating and managing the massive training datasets that today’s models learn from.
Alex Ratner outlines work on Snorkel, an open source framework for building and managing training datasets, and details three key operators for letting users build and manipulate training datasets: labeling functions for labeling unlabeled data, transformation functions for expressing data augmentation strategies, and slicing functions for partitioning and structuring training datasets. These operators allow domain expert users to specify ML models via noisy operators over training data, leading to applications that can be built in hours or days rather than months or years. Alex explores recent work on modeling the noise and imprecision inherent in these operators and using these approaches to train ML models that solve real-world problems, including a recent state-of-the-art result on the SuperGLUE natural language processing benchmark task.
- A basic understanding of machine learning
What you'll learn
- Discover learning techniques for building, managing, and iterating on training datasets and modeling pipelines for ML in general and using the Snorkel framework
Alex Ratner is the project lead of Snorkel, a system for programmatically building and managing training datasets for machine learning, and (starting in 2020) an assistant professor of computer science at the University of Washington. Previously, he completed his PhD in CS advised by Christopher Ré at Stanford, where his research focused on applying data management and statistical learning techniques to emerging machine learning workflows, such as creating and managing training data, and applying this to real-world problems in medicine, knowledge base construction, and more. At Stanford, he started and led the Snorkel project, which has been deployed at large technology companies like Google, academic labs, and government agencies and was recognized in VLDB 2018 (“Best Of”).
Diversity and Inclusion Sponsor
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires