In the beginning, there was only one single most important point of failure… the Name Node. Lose the Name Node, you lose your cluster.
Today, there are now multiple points of failure, depending on which vendor’s solution you choose to implement.
Hive, HCatalog, and Ranger among other components have now become critical components. The failure of any one of these can cripple a cluster and cause either significant downtime, data loss, or both.
By identifying where failures can occur, one can either mitigate the risk, or consider alternative designs.
Michael Segel has been working with Hadoop since 2009 at various companies as a solution architect, solving the tough challenges. He is currently globe-trotting as a principal architect with Segel & Associates, looking for the next challenging problem to solve. Michael spends his free time thinking about solutions as he walks his dogs around the River North neighborhood in Chicago. While the founder of CHUG (Chicago area Hadoop User Group), Michael is also in the process of starting a Big Data Anonymous work group for those recovering big data-holics.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.