Skip to main content

Collecting Massive Data via Crowdsourcing

Mission City
Average rating: **...
(2.35, 20 ratings)
Slides:   1-PPTX 

To harness data one must first have data. Effective collection of massive amounts of data can be a challenge in and crowdsourcing data may well be an efficient solution in some situations. The International Barcode of Life and Technical University Munich (TUM) ProteomicsDB projects are two great examples of collecting and gathering data via crowdsourcing. The former will crowdsource the collection of DNA samples around the world by enabling citizen scientists to provide insect and plant specimens in return for identification and detailed information about their organism. The aim of the consortium of institutions across 25 nations including Canada, the United States, Germany and China, is to create a database containing a DNA-based barcode for every species in the world; with a goal of 500,000 species by the end of 2015. Crowdsourcing the data via a consumer-style application is seen as key to achieving this. The TUM ProteomicsDB project focuses on crowdsourcing data from within a given scientific community, but none the less relies on crowdsourcing to fill its burgeoning data store. The project stores protein and peptide identifications from mass spectrometry-based experiments and the data assembled provides identification of proteins mapping to over 18,000 human genes representing 90% coverage of the human proteome. It currently contains more than 11,000 datasets from human cancer cell lines, tissues and body fluids and enables real-time analysis of this highly dimensional data and creates instant value by allowing to test analytical hypothesis. The crowdsourced data stored and analyzed within ProteomicsDB can be used in basic and biomedical research for discovering therapeutic targets and developing new drugs as well as enhanced diagnosis methods. SAP is proud to be involved with driving the success of both these projects.

This keynote is sponsored by SAP

Photo of Metro John Schitka

Metro John Schitka

Solution Marketing Manager, SAP

John Schitka is a Solution Marketing Manager on the SAP Big Data Solution Marketing team. His focus in the SAP Big Data arena is largely on Hadoop and SAP HANA smart data access capabilities. A graduate of McMaster University, he holds an MBA from the University of Windsor. He has worked in product marketing and product management in the high tech arena for a number of years, taught at a private college and has co-authored a number of published text books. He has a true love of technology and all that it has to offer the world.