Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Learning with limited labeled data

Shioulin Sam (Cloudera Fast Forward Labs)
14:5515:35 Thursday, 2 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Average rating: ****.
(4.45, 11 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, and product people



Prerequisite knowledge

  • A basic understanding of neural nets

What you'll learn

  • Explore classical active learning strategies (engineered heuristics) to choose the "best" data to label
  • Discover active learning algorithms for batch learning, specifically for deep learning
  • Learn how to build models that can learn a general representation of data from many tasks (each with little labeled data) and quickly classify (or bind) new examples


Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist but also has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement but is an expensive undertaking.

Classical approaches to this problem rely on human and machine collaboration. In these approaches, engineered heuristics are used to smartly select “best” instances of data to label, in order to reduce cost. A human steps in to provide the label; the model then learns from this smaller labeled dataset. Recent advancements have made these approaches amenable to deep learning, enabling models to be built with limited labeled data.

Shioulin Sam shares algorithmic approaches that drive this capability and provides practical guidance for translating this capability into production. Join in to discover how and why these algorithms work through a live demo.

Photo of Shioulin Sam

Shioulin Sam

Cloudera Fast Forward Labs

Shioulin Sam is a research engineer at Cloudera Fast Forward Labs, where she bridges academic research in machine learning with industrial applications. Previously, she managed a portfolio of early stage ventures focusing on women-led startups and public market investments and worked in the investment management industry designing quantitative trading strategies. She holds a PhD in electrical engineering and computer science from the Massachusetts Institute of Technology.