Skip to main content

Crowdsourcing at Locu: How I Learned to Stop Worrying and Love the Crowd

Adam Marcus (B12)
Data Science
Ballroom AB
Average rating: ****.
(4.50, 6 ratings)

Crowdsourcing marketplaces like oDesk or Amazon’s Mechanical Turk give us access to people all over the world that can solve various tasks, like virtual personal assistants, image labelers, or people that can clean up gnarly datasets. Humans can solve tasks that artificial intelligence is not yet able to solve, or needs help solving, without having to resort to complex machine learning or statistics. But humans are quirky: give them bad instructions, allow them to get bored, or make them do too repetitive a task, and they will start making mistakes. In this talk, I’ll explain how to effectively benefit from crowd workers to solve your most challenging tasks, using examples from the wild and from our work at Locu.

Machine learning and crowdsourcing are at the core of most of the problems we solve at Locu. When possible, we automate tasks with the help of trained regressions and classifiers. However, it’s not always possible to build machine-only decision-making tools, and we often need to marry machines and crowds. In this talk, I’ll highlight:

  • How we trained a classifier that makes judgements better than the crowd that trained it with tens of thousands of data points.
  • How to apply lessons from other fields like user interface design and cognitive science to your crowd-powered workflows.
  • A hierarchical crowd of several hundred crowd workers that we’ve built to be self-regulating, self-training, and offer our workers a chance at upward mobility.
Photo of Adam Marcus

Adam Marcus

Cofounder and CTO, B12

Adam is Locu’s Director of Data. He recently completed his Ph.D. in Computer Science at MIT. His dissertation is on database systems and human computation. He is a recipient of the NSF and NDSEG fellowships, and has previously worked at ITA, Google, IBM, and FactSet. In his free time, he builds course content to get people excited about data and programming.