Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

What does it mean to virtualize the Hadoop distributed file system?

Thomas Phelan (BlueData)
1:15pm–1:55pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Average rating: ****.
(4.25, 12 ratings)
Slides:   1-PPT 

There are at least two meanings of the phrase “virtualized HDFS.” One is the creation of an HDFS file system within a cluster of virtual machines; the second is the abstraction of the HDFS protocol in order to implement a “virtual” HDFS file system and permit any storage device to provide data to Hadoop applications.

This session will investigate both of these meanings of virtualized HDFS. It will draw from experiences with multiple projects (including Apache Tachyon, MemHDFS, CEPH object store, and others) to describe existing implementations of the first and propose a high-speed implementation of the second.

Photo of Thomas Phelan

Thomas Phelan

BlueData

Thomas Phelan is cofounder and chief architect of BlueData. Prior to BlueData, Tom was an early employee at VMware and as senior staff engineer was a key member of the ESX storage architecture team. During his 10-year stint at VMware, he designed and developed the ESX storage I/O load-balancing subsystem and modular “pluggable storage architecture.” He went on to lead teams working on many key storage initiatives, such as the cloud storage gateway and vFlash. Earlier, Tom was a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially available 64-bit file system.