Big Data and Bibliometrics: Crowdsourcing the World's Largest Database of Research

Jan Reichelt (Mendeley Ltd.), William Gunn (Mendeley Research Networks)
Data Science, Mission City B1

Come learn how the Mendeley team built the largest crowdsourced database of research literature, scaled to handle 120M uploaded documents, and how they’re using technologies such as Hadoop, Apache Mahout and Thrift to generate daily statistics and recommendations on over 7 TB of academic research data. Jan Reichelt, Mendeley co-founder, will talk about the lessons learned in building the service and how this is shaking up the stodgy old field of academic publishing.

In addition to the technical story, Jan will also show how Mendeley’s real-time data on content usage provides never-before-seen insight into how academics collect, read, share, and annotate academic research. Why should you care about academic publishing? It’s a fascinating story… while you’re using Github and Google+ to share information, the best that all the world’s big brains can come up with is swapping PDFs!

Academic publishing is facing many of the same stressors as other kinds of publishing as their content moves online, but since academic publishing has typically derived revenue from institutional purchases as opposed to individual ones and ad sales don’t contribute as much to revenues, the business models have diverged to where academic publishing has had until now very little end-user focus. Academic content is also read more intensively, curated more carefully by end users, and managed with specialized tools, which gives us a unique opportunity to look at content usage at a level of detail not possible in any other industry and distill some insights that are relevant across all of publishing.

Photo of Jan Reichelt

Jan Reichelt

Mendeley Ltd.

Jan Reichelt is the co-founder and president of Mendeley, the world’s largest research collaboration platform. Mendeley helps people to organize and collaborate on research projects, making scientific research more accessible and transparent.

Photo of William Gunn

William Gunn

Mendeley Research Networks

Stem cell biologist by training, work for Mendeley to disrupt scholarly communication and change how research is done, live in San Diego with wife and dog.


  • EMC
  • Microsoft
  • HPCC Systems™ from LexisNexis® Risk Solutions
  • MarkLogic
  • Shared Learning Collaborative
  • Cloudera
  • Digital Reasoning Systems
  • Pentaho
  • Rackspace Hosting
  • Teradata Aster
  • VMware
  • IBM
  • NetApp
  • Oracle
  • 1010data
  • 10gen
  • Acxiom
  • Amazon Web Services
  • Calpont
  • Cisco
  • Couchbase
  • Cray
  • Datameer
  • DataSift
  • DataStax
  • Esri
  • Facebook
  • Feedzai
  • Hadapt
  • Hortonworks
  • Impetus
  • Jaspersoft
  • Karmasphere
  • Lucid Imagination
  • MapR Technologies
  • Pervasive
  • Platform Computing
  • Revolution Analytics
  • Scaleout Software
  • Skytree, Inc.
  • Splunk
  • Tableau Software
  • Talend

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

For media-related inquiries, contact Maureen Jennings at

View a complete list of Strata contacts