Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Decoupling compute and storage with open source Alluxio

Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
1:45pm2:25pm Thursday, December 7, 2017

Who is this presentation for?

  • CTOs, CIOs, system architects, and software engineers

Prerequisite knowledge

  • A general understanding of the big data ecosystem

What you'll learn

  • Learn how to decouple compute and storage with Alluxio
  • Explore best practices, considerations, and solutions when building out a multitenant high-performance platform


As Spark, MapReduce, and many other frameworks are being widely deployed at enterprise productions, an efficient and flexible compute and storage architecture often becomes a hot topic for debate among both IT and LOB practitioners. Although there are good reasons to run compute in a traditional hyperconverge environment as a part of a data lake implementation, decoupling storage and computation is becoming increasingly popular, as O’Reilly recently pointed out in a recent 2017 trend post. For example, Alluxio, IBM, Huawei, EMC, and Red Hat teams have come together to examine real-world application examples and provide joint solutions.

Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors and considerations—application workload pattern, data locality, cost of infrastructure, network bandwidth, cloud deployment, etc.—and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform that addresses real-world business demand.

Photo of Calvin Jia

Calvin Jia


Calvin Jia is the release manager for Alluxio and is a core maintainer of the project. He is also the top contributor to the Alluxio project and one of its earliest contributors. Calvin holds a BS from the University of California, Berkeley.

Photo of Haoyuan Li

Haoyuan Li


Haoyuan (H.Y.) Li is the founder, chairman, and CTO of Alluxio. He holds a PhD in computer science from UC Berkeley’s AMPLab, where he created the Alluxio (formerly Tachyon) open source data orchestration system, cocreated Apache Spark Streaming, and became an Apache Spark founding committer. He also holds an MS from Cornell University and a BS from Peking University, both in computer science.