Topics for any discipline that focuses on quantitative or technical data have always depended on the datasets that were available at the time. Crowdsourcing has changed that — democratizing the data-collection process and cutting researchers’ reliance on stagnant, overused datasets. Tools like Amazon Mechanical Turk allow anyone to gather data overnight, rather than waiting years.
Learn how to leverage data exhaust, the digital byproduct of our online activities, to solve problems and discover insights about the world around you. We will walk through a real world example which combines several datasets and statistical techniques to discover insights and make predictions about attendees at O'Reilly Strata.
Data modeling competitions allow companies and researchers to post a problem and have it scrutinised by the world's best data scientists. By exposing a problem to a wide audience, competitions are a great way to get the most out of a dataset. In just a few months, Kaggle's competitions have helped to progress the state of the art in chess ratings and HIV research.