Crowdsourcing and the Democratization of Data

Disruption & Opportunity
Location: Mission City B1
Average rating: ****.
(4.00, 3 ratings)

Crowdsourcing democratizes the data-collection process, cutting researchers’ reliance on stagnant, overused datasets. Now anyone can gather data overnight, rather than waiting years. Some of the data collection may be sloppy, but it’s possible to build robust quality-control mechanisms in order to standardize the results that come back from the crowd. The important thing to remember is that crowdsourcing provides channels that allow researchers, businesses, or even armchair social scientists to gather data.

Topics for any discipline that focuses on quantitative or technical data have always depended on the datasets that were available at the time. For example, the Brown Corpus is a dataset compiled in the 1960s that has served as the basis for thousands of linguistics studies. Graduate students would center entire research plans on the availability of previously collected data. As a result, generations of papers on word disambiguation were tailored to the constraints of old data. By contrast, almost every grad student in the Stanford linguistics department is using crowdsourcing platforms for their studies.

Using crowdsourcing tools, it’s possible to conduct experiments that replicate the World Color Survey, test Benford’s Law (a.k.a., the “first digit law”), explore age and gender stereotypes, or even pose philosophical problems — in a fraction of the time that such experiments used to take, while yielding tons of data.

Photo of Lukas Biewald

Lukas Biewald

Weights & Biases

Lukas Biewald is the founder and CEO of CrowdFlower. Founded in 2007, CrowdFlower provides Labor-on-Demand to help companies outsource high-volume, repetitive tasks to a massively-distributed global workforce.

Before founding CrowdFlower, Lukas was a senior scientist and manager within the Ranking and Management Team at Powerset, Inc., acquired by Microsoft in 2008. He led the Search Relevance Team for Yahoo! Japan after graduating from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science. Recently, Lukas won the Netexplorateur Award for GiveWork – a collaboration with Samasource that brings digital work to refugees worldwide. Lukas is also an expert level Go player.


  • Thomson Reuters
  • EMC Data Computing Division
  • EnterpriseDB
  • Microsoft
  • Gnip
  • Rackspace Hosting
  • IBM
  • Windows Azure MarketPlace DataMarket
  • Amazon Mechanical Turk
  • Amazon Web Services
  • Aster Data
  • Cloudera
  • Clustrix
  • DataStax, Inc. (formerly Riptano, Inc.)
  • Digital Reasoning Systems
  • Heritage Provider Network
  • Impetus
  • Jaspersoft
  • Karmasphere
  • LinkedIn
  • MarkLogic
  • Pentaho
  • Pervasive
  • Revolution Analytics
  • Splunk
  • Urban Mapping
  • Wolfram|Alpha
  • Esri
  • ParAccel
  • Tableau Software

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at

Download the Strata Sponsor/Exhibitor Prospectus

Contact Us

View a complete list of Strata Contacts