The BigData Top100 List

Hadoop in Practice Great America Ballroom K
Average rating: **...
(2.00, 1 rating)

We will describe the BigData Top100 List initiative—an new, open, community-based effort for benchmarking big data systems. The BigData Top100 list will rank big data systems according to a well-defined, audited performance metric. The benchmark also provides an accompanying efficiency metric. With “big data” becoming a major force of innovation across enterprises of all sizes, new platforms for managing big data sets are being announced almost on a weekly basis with increasingly more features. Yet, there is currently a lack of any means of comparability among such platforms. While the performance of traditional database systems is well understood and measured by long-established institutions such as the Transaction Processing Performance Council, there is neither a clear definition of the performance of big data systems nor a generally agreed upon metric for comparing these systems. This session unveils a community-based effort for defining an end-to-end application-layer benchmark for big data applications, with the ability to easily adapt the benchmark specification to evolving challenges in the big data space. We actively seek community input into this process.

Photo of Milind Bhandarkar

Milind Bhandarkar

Greenplum, A Division of EMC

Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system, and has been contributing and working with Hadoop since version 0.1.0. He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at the Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo! and Linkedin. Currently, he is the Chief Scientist at Greenplum, a division of EMC.

Photo of Chaitan Baru

Chaitan Baru

SDSC/UC San Diego

Chaitan Baru is Distinguished Scientist and Associate Director Data Initiatives at the San Diego Supercomputer Center, University of California San Diego, where he also directs the Center for Large-scale Data Systems Research (CLDS). Baru’s interests are in research and development in the areas of parallel database systems, scientific data management, data analytics, and the challenges of data-driven science and data-driven enterprises. Baru has played a leadership role in a number of national-scale cyberinfrastructure R&D efforts across a wide range of science disciplines from earth sciences to ecology, biomedical informatics, and healthcare. Prior to joining SDSC in 1996, Baru led one of the development teams at IBM for an early UNIX-based shared-nothing database systems (DB2 Parallel Edition) and also led a team that produced the first result for an industry-standard decision support benchmark (TPC-D). Over the past one year, Baru has led the effort to create a Big Data Benchmarking community, leading to the proposal to create a BigData100 List, borrowing benchmarking ideas from the high-performance computing and transaction processing and database communities.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts