Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Human in the loop: A design pattern for managing teams working with machine learning

Paco Nathan (
14:0514:45 Thursday, 24 May 2018
Data science and machine learning
Location: Capital Suite 10/11 Level: Beginner
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data science managers and execs, data scientists, and engineers

Prerequisite knowledge

  • Experience working on or managing data teams
  • Familiarity with big data applications
  • A basic understanding of machine learning and data science use cases in industry

What you'll learn

  • Explore human in the loop (HITL) use cases and real-wold examples
  • Understand management perspectives, open source projects, and commercial products for leveraging HITL


Although it has long been used for has been used for use cases like simulation, training, and UX mockups, human in the loop (HITL) has emerged as a key design pattern for managing teams where people and machines collaborate. One approach, active learning (sometimes called semisupervised learning), employs mostly automated processes based on machine learning models but refers edge cases—typically the areas which are most uncertain or highest risk—to human experts, whose decisions help improve new iterations of the models. Meanwhile, an HITL practice can help organizations prepare datasets for use in deep learning.

Paco Nathan reviews case studies and management perspectives for leveraging HITL, along with related open source projects and commercial products. In particular, Paco examines a use case at O’Reilly Media in which ML pipelines for categorizing content are trained solely by subject-matter experts providing examples, based on HITL and leveraging Project Jupyter, Apache Spark, and scikit-learn for implementation.

Topics include:

  • Training and managing a team that uses HITL at scale
  • When to use HITL (and when not to)
  • The relationship between HITL and leveraging deep learning
  • How HITL approaches compare with more typical big data approaches
  • How the humans involved learn better process from the machines
Photo of Paco Nathan

Paco Nathan

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.