Skip to main content

Big Data: Beyond Bare-Metal?

Mike Wendt (NVIDIA)
Ballroom F
Average rating: ****.
(4.50, 2 ratings)
Slides:   1-PPTX 

In this session, we will share the results of our study, a price-performance comparison of a bare-metal Hadoop cluster and cloud-based Hadoop clusters. Using the TCO model we developed, we created eight different cloud-based Hadoop clusters utilizing four virtual machine instance types each with two data-flow models to compare against our bare-metal Hadoop cluster. The Accenture Data Platform Benchmark provided us with three real-world Hadoop applications to compare the execution-time performance of these clusters.

Results of this study reinforce our original findings. First, cloud-based Hadoop deployments—Hadoop on the cloud and Hadoop-as-a-Service—offer better price-performance ratios than bare-metal clusters. Second, the benefit of performance tuning is so huge that cloud’s virtualization layer overhead is a worthy investment as it expands performance-tuning opportunities. Third, despite the sizable benefit, the performance-tuning process is complex and time-consuming and thus requires automated tuning tools.

In addition to our original findings, we were able to observe the performance impact of data locality and remote storage within the cloud. While counterintuitive, our experiments prove that using remote storage to make data highly available outperforms local disk HDFS relying on data locality.

Choosing a cloud-based Hadoop deployment depends on the needs of the organization: Hadoop on the cloud offers more control of Hadoop clusters, while Hadoop-as-a-Service offers simplified operation. Once a deployment model has been selected, organizations should consider these four key areas when selecting a cloud provider: workload utilization and demands, pricing structure, cloud architecture, and operator usability. Careful consideration of these areas will ensure that businesses are successful and are able to maximize their performance on the cloud.

This session is sponsored by Accenture

Photo of Mike Wendt

Mike Wendt

Manager, Applied Solutions Engineering, NVIDIA

Michael Wendt is a R&D Associate Manager at Accenture Technology Labs in San Jose, CA. Since joining Accenture Technology Labs, Michael has worked with Hadoop, Cassandra, Storm and other Big Data technologies. His research work includes benchmarking bare-metal and cloud-based Hadoop clusters, comparing their price-performance ratio. In addition to his research work on Hadoop, he has advised and helped clients to deploy Hadoop systems and contributed to the design and development a real-time stream processing platform consisting of Storm and Cassandra. Michael has a BS in Computer Engineering from University of Maryland: College Park.