Although Hadoop is designed to be resilient to the loss of hard disks or individual servers, the failure of core services can make the cluster temporarily unavailable, while other failures in a datacentre may lead to the permanent loss of data.
This talk looks at the risks, from the hardware to the entire software stack, using real data from customer sites to estimate their likelihood.
It introduces the best practices for availability, failure recovery and disaster recovery for Hadoop clusters.
Finally, it covers ongoing work for High Availability in the Hadoop platform, including filesystem snapshots and disaster recovery.
How does Hadoop fail?
Is this a real threat?
Hard data from Yahoo!, Hortonworks customers and published research from Google and Microsoft shows which risks matter the most
What can be done about this?
How can Hadoop get better?
What are the recent changes to Hadoop that mitigate some of the risks -including improved failover and recovery of the core services, filesystem snapshots and other new features.
Real-world data shows that there are small yet measurable risks to the availability of a Hadoop cluster -and the actual data within it. Recent changes to the Hadoop platform will reduce this risk, but an understanding of the risks and strategies to mitigate the risks are still essential.
Steve Loughran is a member of technical staff at Hortonworks, where he works on leading-edge issues with the Hadoop ecosystem, including service failure modes and availability.
Prior to joining Hortonworks he worked at HP Laboratories on large-scale distributed systems, including cloud computing infrastructures. He is the author of Ant in Action, and is one of the very few UK-based Hadoop committers.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com or +1 (707) 827-7148
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts.