Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Best practices for using Alluxio with Spark

cheng chang (Alluxio), Haoyuan Li (Alluxio)
1:15pm1:55pm Wednesday, September 27, 2017
Data Engineering & Architecture, Spark & beyond
Location: 1A 15/16/17 Level: Intermediate

Who is this presentation for?

  • Data engineers, data scientists, architects, and anyone who works with Spark to analyze data

Prerequisite knowledge

  • Basic knowledge of Spark

What you'll learn

  • Learn how to effectively use Alluxio with Spark to improve performance and manageability of data analytics


Alluxio (formerly Tachyon) is a memory-speed virtual distributed storage system that leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PBs of data.

While Spark is gaining great adoption in the big data ecosystem, Alluxio bridges Spark applications with various storage systems, further accelerating data-intensive applications. Alluxio provides a unified namespace of data from various different storage systems, which is convenient for application developers. Alluxio also uses memory to store hot data for applications for fast access to important data. And even though Spark has an in-memory cache, Alluxio’s in-memory storage can further improve Spark applications.

Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective in both on-premises and public cloud deployments and share production deployments of Alluxio and Spark working together. Along the way, they discuss best practices for using Alluxio with Spark, including with RDDs and DataFrames.

cheng chang


Cheng Chang is a software engineer at Alluxio and the fourth highest contributor to the Alluxio open source project. Cheng is also the main developer of Alluxio Manager. He has presented talks at Strata Beijing, Spark Summit, and other leading industry events. He holds a degree in computer science from Tsinghua University.

Photo of Haoyuan Li

Haoyuan Li


Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus), a memory-speed virtual distributed storage system. Before founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds an MS from Cornell University and a BS from Peking University.

Comments on this page are now closed.


Picture of Haoyuan Li
Haoyuan Li | CEO
09/28/2017 5:54pm EDT

Yes. For more questions, please visit Alluxio User Mailing List:!forum/alluxio-users or ; Cheers.

09/28/2017 7:04am EDT

Does Alluxio integrate with Hive or AWS Glue?