Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Human in the loop: A design pattern for managing teams working with machine learning

Paco Nathan (
11:50am12:30pm Thursday, March 8, 2018
Average rating: ****.
(4.25, 4 ratings)

Who is this presentation for?

  • Data science managers and executives

Prerequisite knowledge

  • Management experience with data teams
  • A basic understanding of big data applications and machine learning and data science use cases in industry

What you'll learn

  • Explore using human in the loop for managing teams where people and machines collaborate


Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (sometimes called semisupervised learning), employs mostly automated processes based on machine learning models, but exceptions are referred to human experts, whose decisions help improve new iterations of the models.

Paco Nathan offers an overview of HITL and active learning, covering technical aspects, including available open source projects, and management perspectives for how to apply HITL. In particular, Paco examines a use case at O’Reilly Media in which ML pipelines for categorizing content are trained solely by subject-matter experts providing examples, based on HITL and leveraging Project Jupyter for implementation.

Topics include:

  • Training and managing a team that uses HITL at scale
  • When to use HITL (and when not to)
  • The relationship between HITL and leveraging deep learning
  • How HITL approaches compare with more typical big data approaches
  • How the humans involved learn better process from the machines
Photo of Paco Nathan

Paco Nathan

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.