Sep 23–26, 2019
Please log in

Learning with limited labeled data

Shioulin Sam (Cloudera Fast Forward Labs)
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 12/14
Secondary topics:  Deep Learning
Average rating: ****.
(4.29, 7 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, and product managers




Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist but has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement, but is an expensive undertaking.

Classical approaches to this problem rely on human and machine collaboration. In these approaches, engineered heuristics are used to smartly select “best” instances of data to label in order to reduce cost. A human steps in to provide the label; the model then learns from this smaller labeled dataset. Recent advancements have made these approaches amenable to deep learning, enabling models to be built with limited labeled data.

Shioulin Sam explores algorithmic approaches that drive this capability and provides practical guidance for translating this capability into production. You’ll view a live demonstration to understand how and why these algorithms work.

Prerequisite knowledge

  • A basic understanding of math, classifiers, and neural networks

What you'll learn

  • Learn classical active learning strategies (engineered heuristics) to choose the "best" data to label, active learning algorithms tailored for deep learning, an under-the-hood understanding of active learning, and when to use active learning and what to look out for
Photo of Shioulin Sam

Shioulin Sam

Cloudera Fast Forward Labs

Shioulin Sam is a research engineer at Cloudera Fast Forward Labs, where she bridges academic research in machine learning with industrial applications. Previously, she managed a portfolio of early stage ventures focusing on women-led startups and public market investments and worked in the investment management industry designing quantitative trading strategies. She holds a PhD in electrical engineering and computer science from the Massachusetts Institute of Technology.

Comments on this page are now closed.


Picture of Shioulin Sam
Shioulin Sam | Research Engineer
10/15/2019 5:10pm EDT

Link to presentation is here

Bo Rin Jung | Lead Consultant
10/14/2019 12:55am EDT

Hi Shioulin

Is there anyway to get your awesome presentation?

Thanks, Bo

Gilad Barkan | Data Scientist
09/30/2019 1:32pm EDT

Is there a link to your great presentation?

Thanks, Gilad

aaron nematnejad | Data Scientist
09/27/2019 10:59am EDT

Hi Shioulin

Do you have a link to your presentation?


  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires