Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Alluxio (formerly Tachyon): The journey thus far and the road ahead

Haoyuan Li (Alluxio), Gene Pang (Alluxio)
11:50am12:30pm Thursday, March 16, 2017
Big data and the Cloud
Location: LL21 C/D Level: Beginner
Secondary topics:  Architecture
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Software engineers, data scientists, and storage admins

Prerequisite knowledge

  • Basic knowledge about distributed storage systems and big data applications

What you'll learn

  • Learn the basics of Alluxio, including its design goal, functions, and common use cases


Alluxio (formerly Tachyon) is an open source memory-speed virtual distributed storage system. The Alluxio community is one of the fastest growing open source communities in big data history, with more than 300 developers from over 100 organizations around the world, and the Alluxio system has been deployed at a number of companies, including Alibaba, Baidu, Barclays, Intel, Huawei, and Qunar. In some of these deployments, Alluxio has been running in production for over a year, managing petabytes of data.

In the past year, the Alluxio project experienced significant improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. At the same time, the Alluxio ecosystem has expanded to include support for more under storage systems and computation frameworks. In particular, Alluxio now supports a wide range of under storage systems, including Amazon S3, Google Cloud Storage, Gluster, Ceph, HDFS, NFS, and OpenStack Swift. These integrations make it possible for Alluxio to be leveraged in many different environments.

Haoyuan Li and Gene Pang explore Alluxio’s goal of making its product accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. Haoyuan and Gene also cover some new APIs Alluxio is working on to allow applications to access data more efficiently and manage data across different under storage systems.

Photo of Haoyuan Li

Haoyuan Li


Haoyuan (H.Y.) Li is the founder, chairman, and CTO of Alluxio. He holds a PhD in computer science from UC Berkeley’s AMPLab, where he created the Alluxio (formerly Tachyon) open source data orchestration system, cocreated Apache Spark Streaming, and became an Apache Spark founding committer. He also holds an MS from Cornell University and a BS from Peking University, both in computer science.

Photo of Gene Pang

Gene Pang


Gene Pang is a software engineer at Alluxio. Previously, he worked at Google. Gene recently earned his PhD from the AMPLab at UC Berkeley, working on distributed database systems, and holds an MS from Stanford University and a BS from Cornell University.