Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Human-in-the-loop: a design pattern for managing teams that leverage ML

Paco Nathan (
11:15am11:55am Wednesday, December 6, 2017
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • overall, targeted toward data science managers,executives
  • some technical portions are targeted at data scientists and engineers

Prerequisite knowledge

  • experience working on or managing data teams
  • some familiarity with Big Data applications
  • basic understanding of Machine Learning and Data Science use cases in industry


Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called active learning allows for mostly automated processes based on ML, where exceptions get referred to human experts. Those human judgements in turn help improve new iterations of the ML models.

This talk reviews key case studies about active learning, plus other approaches for human-in-the-loop which are emerging among AI applications. We’ll consider some of the technical aspects — including available open source projects — as well as management perspectives for how to apply HITL:

  • When is HITL indicated vs. when isn’t it applicable?
  • How do HITL approaches compare/contrast with more “typical” use of Big Data?
  • What’s the relationship between use of HITL and preparing an organization to leverage Deep Learning?
  • Experiences training and managing a team which uses HITL at scale
  • Caveats to know ahead of time
  • In what ways do the humans involved learn from the machines?

In particular, we’ll examine use cases at O’Reilly Media where ML pipelines for categorizing content are trained by subject matter experts providing examples, based on HITL and leveraging open source [Project Jupyter]( for implementation).

Photo of Paco Nathan

Paco Nathan

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.