Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Unified namespace and tiered storage in Alluxio

Calvin Jia (Alluxio), Jiri Simsa (Alluxio)
2:40pm–3:20pm Wednesday, 03/30/2016
Data Innovations

Location: 210 D/H
Tags: real-time

Prerequisite knowledge

Attendees should have a basic understanding of big data workloads.

Description

We know that storage resources are not all equal, but making distributed big data applications to take advantage of—or even simply understand—this difference is very difficult. Alluxio Inc. has developed Alluxio unified namespace and tiered storage to address this problem, combining two simple yet highly effective ideas:

  1. In addition to memory, Alluxio manages additional resources such as SSDs and migrates data among different storage types to provide a much bigger capacity to computation frameworks with close-to-optimal throughput.
  2. Alluxio provides a unified namespace which makes it possible to store, access, and manage data from different and heterogeneous data sources using a single namespace.

Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift. Once a data source is mounted, operations such as creation, deletion, or renaming on objects in the Alluxio namespace are transparently mapped onto the corresponding objects in the namespace of the underlying storage system. Furthermore, information about mounted data sources is managed centrally by the Alluxio master serviced, facilitating reconfiguration.

Photo of Calvin Jia

Calvin Jia

Alluxio

Calvin Jia is a software engineer at Tachyon Nexus and a top contributor to Tachyon.

Photo of Jiri Simsa

Jiri Simsa

Alluxio

Jiri Simsa is a software engineer at Alluxio, Inc., where he is one of the maintainers and top contributors of the Alluxio open source project. Before joining Alluxio, Inc., Jiri was a software engineer at Google, working on yet another distributed applications framework. He earned his PhD in computer science from Carnegie Mellon University for his work on systematic and scalable testing of concurrent systems.