We know that storage resources are not all equal, but making distributed big data applications to take advantage of—or even simply understand—this difference is very difficult. Alluxio Inc. has developed Alluxio unified namespace and tiered storage to address this problem, combining two simple yet highly effective ideas:
Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift. Once a data source is mounted, operations such as creation, deletion, or renaming on objects in the Alluxio namespace are transparently mapped onto the corresponding objects in the namespace of the underlying storage system. Furthermore, information about mounted data sources is managed centrally by the Alluxio master serviced, facilitating reconfiguration.
Calvin Jia is the release manager for Alluxio and is a core maintainer of the project. He is also the top contributor to the Alluxio project and one of its earliest contributors. Calvin holds a BS from the University of California, Berkeley.
Jiri Simsa is a software engineer at Alluxio, Inc., where he is one of the maintainers and top contributors of the Alluxio open source project. Before joining Alluxio, Inc., Jiri was a software engineer at Google, working on yet another distributed applications framework. He earned his PhD in computer science from Carnegie Mellon University for his work on systematic and scalable testing of concurrent systems.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.