For too long, the conversation about big data deployment has started and ended with bare metal. Some people start on AWS, but they all move off when they scale because it’s not economic, right?
The truth is, there are several things you should think about when choosing your infrastructure, and infrastructure services for big data. Data processing and access patterns vary dramatically across different analytical workloads. The corresponding mix of memory, processing and storage needed to address these analytical workloads varies considerably as well. A good “cloud” environment can allow the (virtual) architecture to be optimized by the job, where bare metal is a one-time optimization when the hardware is procured.
Do the benefits of optimizing for the job outweigh the performance hit of virtualization? Can you make up for that with a little more hardware? Does it make sense to use bare metal for well understood and predictable workloads? The answers to these questions are changing rapidly as prices drop on the cloud, and as architectures and best practices are adapted to running big data architectures on virtual environments.
So what about the similarity to managing variable electricity demand loads during the day? Bare metal is like what’s known as a “base load” generation source (e.g. nuclear) – it efficiently provides power, but can’t be throttled very well. so they generate a stable “base load.” Base load generators are used for the consistent electricity demand throughout the day. Cloud big data deployments, by contrast, resemble gas power plants – quickly deployed to meet peak demand events during the day, like when a large portion of England switches on their teapots at the same time after a popular evening television program.
A modern CIO seeking to rationalize their company’s data architecture needs to consider a mix of loads and deployment options just like a utility executive has to invest in a good generation mix. In the presentation we will articulate a framework for applying the levers available to architects as they plot a course forward in this era of big data technologies. We will offer some important considerations, born of experience building the world’s largest enterprise big data deployments. We will draw from our specific experiences working with platforms ranging from Hadoop to Cassandra to Greenplum and Postgres, deployed across massive in-house data centers, the world’s largest clouds, and large enterprise private virtual infrastructure.
John Akred likes to help organizations become more data driven. Mr. Akred has over 15 years of experience in advanced analytical applications and analytical system architecture. He is a recognized expert in the areas of applied business analytics, machine learning, predictive analytics, and operational data mining. He has deep expertise in the application of various architectural approaches such as: distributed non-relational data stores (NoSQL), stream processing, in-database analytics, event-driven architectures and specialized appliances; to real-time scoring, real-time optimization, and similar applications of analytics at scale.
John received a BA in Economics from the University of New Hampshire, and a MS in Computer Science, focussed on Distributed Systems from DePaul University.
A leading expert on big data architecture and Hadoop, Stephen O’Sullivan has 20 years of experience creating scalable, high-availability data and applications solutions. A veteran of WalmartLabs, Sun, and Yahoo, Stephen leads data architecture and infrastructure at Silicon Valley Data Science.
For exhibition and sponsorship opportunities, contact Susan Stewart at firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences email mediapartners
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World 2013 contacts