HDFS for Geographically Distributed File System

Konstantin Shvachko (WANdisco)
Sponsored
Location: 118-119
Slides:   1-PDF 

There are different dimensions for scalability of a distributed storage system: more data, more stored objects, more nodes, more load, additional data centers, etc. This presentation addresses the geographic scalability of HDFS. It describes unique techniques implemented at WANdisco, which allow scaling HDFS over multiple geographically distributed data centers for continuous availability. The distinguished principle of our approach is that metadata is replicated synchronously between data centers using a coordination engine, while the data is copied over the WAN asynchronously. This allows strict consistency of the namespace on the one hand and fast LAN-speed data ingestion on the other. In this approach geographically separated parts of the system operate as a single HDFS cluster, where data can be actively accessed and updated from any data center. The presentation will reveal details of the design, explain main use cases, compare it with existing approaches, and evaluate the system performance. It will also cover advanced features such as selective data replication and dynamic membership reconfiguration.

This session is sponsored by WANdisco

Photo of Konstantin Shvachko

Konstantin Shvachko

WANdisco

Konstantin Shvachko, Chief Architect of WANdisco is a veteran Hadoop developer and well-respected industry author and speaker. A technical expert specializing in efficient data structures and algorithms for large-scale distributed storage systems, Konstantin joined WANdisco through the acquisition of AltoStor, a Hadoop-as-a-Service platform company, and prior to AltoStor, he was founder and Chief Scientist at AltoScale, a Hadoop and HBase-as-a-Platform company acquired by VertiCloud. Before AltoScale, Konstantin played a lead architectural role at eBay, building two generations of the organization’s Hadoop platform. Prior to eBay, as Principal Hadoop SE at Yahoo!, he worked on the Hadoop Distributed File System (HDFS). He has dozens of publications and presentations to his credit include those in the fields of Big Data Storage, Distributed Computing, Algorithms, Computational Complexity, and more. He is currently a member of the Apache Hadoop PMC. Konstantin has a Ph.D. in Computer Science and M.S. in Mathematics from Moscow State University, Russia.