Humans, Machines, and the Dimensions of Microwork

Daniel Tunkelang (Various), Claire Hunsaker (Samasource)
Data Science, Mission City B1
Average rating: ****.
(4.00, 1 rating)

The advent of crowdsourcing has wildly expanded the ways we think of incorporating human judgments into computational workflows. Computer scientists, economists, and sociologists have explored how to effectively and efficiently distribute microwork tasks to crowds and use their work as inputs to create or improve data products. Simultaneously, crowdsourcing providers are exploring the bounds of mechanical QA flows, worker interfaces, and workforce management systems.

But what tasks should be performed by humans rather than algorithms? And what makes a set of human judgments robust? Quantity? Consensus? Quality or trustworthiness of the workers? Moreover, the robustness of judgments depends not only on the workers, but on the task design. Effective crowdsourcing is a cooperative endeavor.

In this talk, we will analyze various dimensions of microwork that characterize applications, tasks, and crowds. Drawing on our experience at companies that have pioneered the use of microwork (Samasource) and data science (LinkedIn), we will offer practical advice to help you design crowdsourcing workflows to meet your data product needs.

Photo of Daniel Tunkelang

Daniel Tunkelang


Daniel Tunkelang oversees the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn’s members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee and Chief Scientist of Endeca, a leader in enterprise search and business intelligence that pioneered the use of guided navigation in search applications. He has authored eight patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Photo of Claire Hunsaker

Claire Hunsaker


Claire works at Samasource, a San Francisco-based social enterprise that connects people living in poverty with internet-based work through a proprietary platform. At Samasource, her hats have included leading product, strategy and field expansion, but these days she helps clients connect with Samasource data and content solutions as the head of Sales and Marketing. Her prior gigs have included LiveOps, social enterprise in rural Vietnam, and management consulting with Katzenbach Partners, where she led client teams at large technology companies and helped several non-profits with large-scale operational growth.

Claire holds a BA from Columbia, an MA from the University of London, and an MBA from Stanford.

In her spare time, she knits, plays with Drupal, and sets small fires in her kitchen.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts