There are different dimensions for scalability of a distributed storage system: more data, more stored objects, more nodes, more load, additional data centers, etc. This presentation addresses the geographic scalability of HDFS. It describes unique techniques implemented at WANdisco, which allow scaling HDFS over multiple geographically distributed data centers for continuous availability. The distinguished principle of our approach is that metadata is replicated synchronously between data centers using a coordination engine, while the data is copied over the WAN asynchronously. This allows strict consistency of the namespace on the one hand and fast LAN-speed data ingestion on the other. In this approach geographically separated parts of the system operate as a single HDFS cluster, where data can be actively accessed and updated from any data center. The presentation will reveal details of the design, explain main use cases, compare it with existing approaches, and evaluate the system performance. It will also cover advanced features such as selective data replication and dynamic membership reconfiguration.
This session is sponsored by WANdisco
Konstantin Shvachko, Chief Architect of WANdisco is a veteran Hadoop developer and well-respected industry author and speaker. A technical expert specializing in efficient data structures and algorithms for large-scale distributed storage systems, Konstantin joined WANdisco through the acquisition of AltoStor, a Hadoop-as-a-Service platform company, and prior to AltoStor, he was founder and Chief Scientist at AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. Before AltoScale, Konstantin played a lead architectural role at eBay, building two generations of the organization’s Hadoop platform. Prior to eBay, as Principal Hadoop SE at Yahoo!, he worked on the Hadoop Distributed File System (HDFS). He has dozens of publications and presentations to his credit include those in the fields of Big Data Storage, Distributed Computing, Algorithms, Computational Complexity, and more. He is currently a member of the Apache Hadoop PMC. Konstantin has a Ph.D. in Computer Science and M.S. in Mathematics from Moscow State University, Russia.
For exhibition and sponsorship opportunities, email email@example.com
For information on trade opportunities with O'Reilly conferences, email firstname.lastname@example.org
For media-related inquiries, contact Maureen Jennings at email@example.com
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.