There are at least two meanings of the phrase “virtualized HDFS.” One is the creation of an HDFS file system within a cluster of virtual machines; the second is the abstraction of the HDFS protocol in order to implement a “virtual” HDFS file system and permit any storage device to provide data to Hadoop applications.
This session will investigate both of these meanings of virtualized HDFS. It will draw from experiences with multiple projects (including Apache Tachyon, MemHDFS, CEPH object store, and others) to describe existing implementations of the first and propose a high-speed implementation of the second.
Thomas Phelan is cofounder and chief architect of BlueData. Previously, Tom was an early employee at VMware; as senior staff engineer, he was a key member of the ESX storage architecture team. During his 10-year stint at VMware, he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture. He went on to lead teams working on many key storage initiatives, such as the cloud storage gateway and vFlash. Earlier, Tom was a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially available 64-bit filesystem.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.