Mar 15–18, 2020

Programmatically building and managing training datasets with Snorkel

Paroma Varma (Snorkel)
9:00am12:30pm Monday, March 16, 2020
Location: 210 E



Paroma Varma teaches you how to build and manage training datasets programmatically with Snorkel, an open source framework developed at the Stanford AI Lab, and demonstrates how this can lead to more efficiently building and managing ML models in a range of practical settings.

You’ll learn how to use Snorkel to programmatically label, transform (or augment), and slice training datasets in order to train a downstream ML model, using an extended version of the open source tutorial on using Snorkel to train a spam classifier over YouTube comments. Paroma offers a high-level overview of other Snorkel features, including applications to other data modalities and ML problems, an outlook on next step developments for Snorkel, and brainstorms use cases of Snorkel.

Prerequisite knowledge

  • A basic knowledge of supervised ML

Materials or downloads needed in advance

  • A WiFi-enabled laptop

What you'll learn

  • Learn how to use Snorkel to programmatically build and manage training datasets for ML
  • Understand various modern ML topics supported by and/or used in Snorkel including weak supervision, data augmentation, and data slicing

Paroma Varma


Paroma Varma is a cofounder at Snorkel and completed a PhD at Stanford, advised by Professor Christopher Ré and affiliated with the DAWN, SAIL, and StatML groups, where she was supported by the Stanford Graduate Fellowship and the National Science Foundation Graduate Research Fellowship. Her research interests revolve around weak supervision or using high-level knowledge in the form of noisy labeling sources to efficiently label massive datasets required to train machine learning models.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)


Picture of Sophia DeMartini
Sophia DeMartini | Senior Speaker Manager
11/21/2019 1:10am PST

Hi Suracha – sorry about that – it should now be showing as an option in registration.

If you’ve already completed your registration, please contact and they can assist you.

Suracha Arayachatsakul | Marketing
11/20/2019 4:35pm PST

I can see “Programmatically Building & Managing Training Datasets with Snorkel” session on the schedule page, but when I register (Silver Pass) I can’t see “Programmatically Building & Managing Training Datasets with Snorkel” session.

How should I do?

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires