Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

What does it mean to virtualize the Hadoop distributed file system?

Thomas Phelan (HPE BlueData)
1:15pm–1:55pm Thursday, 10/01/2015
Hadoop Internals & Development
Location: 1 E16 / 1 E17 Level: Advanced
Average rating: ****.
(4.25, 12 ratings)
Slides:   1-PPT 

There are at least two meanings of the phrase “virtualized HDFS.” One is the creation of an HDFS file system within a cluster of virtual machines; the second is the abstraction of the HDFS protocol in order to implement a “virtual” HDFS file system and permit any storage device to provide data to Hadoop applications.

This session will investigate both of these meanings of virtualized HDFS. It will draw from experiences with multiple projects (including Apache Tachyon, MemHDFS, CEPH object store, and others) to describe existing implementations of the first and propose a high-speed implementation of the second.

Photo of Thomas Phelan

Thomas Phelan

HPE BlueData

Thomas Phelan is cofounder and chief architect of BlueData. Previously, a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially availably 64-bit file system; and an early employee at VMware, a senior staff engineer and a key member of the ESX storage architecture team where he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture as well as led teams working on many key storage initiatives such as the cloud storage gateway and vFlash.