Skip to main content

Organizing Big Data with the Crowd

Lukas Biewald (Weights & Biases)
Data Science
Ballroom AB
Average rating: ***..
(3.83, 6 ratings)

Using big data effectively almost always involves large amounts of cleaning and processing. Proper categorization and attribute labels are essential. In many cases some of the steps can only be done manually making crowdsourcing a crucial tool for data scientists.

This talk will describe micotasking, where it fits in the crowdsourcing landscape, and how data scientists and developers can most effectively tap into the crowd to collect and process their data sets. Several real world cases will be used to illustrate the possibilities, including tweet analysis, social profile mining and pre-processing satellite imagery for big data queries. In this talk I will also take a stab at predicting where the state of the art will be a year from now.

  • Collecting large data sets using the crowd
  • Augmenting, labeling and categorizing using microtasking
  • Conducting big data experiments using the crowd
  • Training machine learning models using results from the crowd
  • Real-world examples of tweet and sentiment analysis, satellite imagery, and social profile mining
Photo of Lukas Biewald

Lukas Biewald

Founder and Chief Data Scientist , Weights & Biases

Lukas Biewald is the founder and CEO of CrowdFlower. Founded in 2007, CrowdFlower provides Labor-on-Demand to help companies outsource high-volume, repetitive tasks to a massively-distributed global workforce.

Before founding CrowdFlower, Lukas was a senior scientist and manager within the Ranking and Management Team at Powerset, Inc., acquired by Microsoft in 2008. He led the Search Relevance Team for Yahoo! Japan after graduating from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science. Recently, Lukas won the Netexplorateur Award for GiveWork – a collaboration with Samasource that brings digital work to refugees worldwide. Lukas is also an expert level Go player.