Mar 15–18, 2020

Weak supervision for stronger models: Increasing classification strength using noisy data

Sumeet Vij (Booz Allen Hamilton)
11:50am12:30pm Tuesday, March 17, 2020
Location: 210 C/G

Who is this presentation for?

  • Data scientists and machine learning engineers




The real-world impact of deep learning has grown in leaps and bounds; however, these models require massive labeled training sets. Large labeled training datasets remain a key bottleneck in supervised machine learning and is mostly unavailable within an enterprise for domain specific tasks. For most enterprises, creating these large labeled training datasets can be expensive, slow, time consuming, or even impractical at times. Even if domain expertise is available and applied for hand labeling, enterprises have to deal with the rapid depreciation of training sets as applications shift and evolve.

Weak supervision uses noisy, limited, or imprecise sources to provide supervision signals for labeling large amounts of training data in a supervised learning setting. Using the data programming paradigm, labeling functions programmatically label data and supervised learning is used to assess the accuracy of the labeling functions, converting low-quality inputs into high-quality models.

Sumeet Vij showcases an innovative application of weak supervision using the Snorkel framework to leverage existing organizational knowledge, unstructured data, and conversation logs. Using denoising labeling functions, a generative model, and AI-powered search, it’s possible to quickly generate large training sets for conversational assistants, helping improve user input classification. The code-as-supervision paradigm important for operationalizing AI and machine learning in an enterprise.

Prerequisite knowledge

  • A basic understanding of machine learning and building classifiers

What you'll learn

  • Learn a practical example of applying weak supervision to leverage unlabeled enterprise knowledge at scale to build robust classifiers
  • Discover how the Snorkel framework, programmatic labeling, and natural language processing (NLP) can enable diverse resources and heuristics to generate large dataset to effectively train conversational assistants
Photo of Sumeet Vij

Sumeet Vij

Booz Allen Hamilton

Sumeet Vij is a director in the Strategic Innovation Group (SIG) at Booz Allen Hamilton, where he leads multiple client engagements, research, and strategic partnerships in the field of AI, digital personalization, recommendation systems, chatbots, digital assistants, and conversational commerce. Sumeet is also the practice lead for next-generation digital experiences powered by AI and data science, helping with the large-scale analysis of data and its use to quickly provide deeper insights, create new capabilities, and drive down costs.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

For media/analyst press inquires