Alluxio (formerly Tachyon) is a memory-speed virtual distributed storage system that leverages memory for storing data and accelerating access to data in different storage systems. Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PBs of data.
While Spark is gaining great adoption in the big data ecosystem, Alluxio bridges Spark applications with various storage systems, further accelerating data-intensive applications. Alluxio provides a unified namespace of data from various different storage systems, which is convenient for application developers. Alluxio also uses memory to store hot data for applications for fast access to important data. And even though Spark has an in-memory cache, Alluxio’s in-memory storage can further improve Spark applications.
Haoyuan Li and Cheng Chang explain how Alluxio makes Spark more effective in both on-premises and public cloud deployments and share production deployments of Alluxio and Spark working together. Along the way, they discuss best practices for using Alluxio with Spark, including with RDDs and DataFrames.
Cheng Chang is a software engineer at Alluxio and the fourth highest contributor to the Alluxio open source project. Cheng is also the main developer of Alluxio Manager. He has presented talks at Strata Beijing, Spark Summit, and other leading industry events. He holds a degree in computer science from Tsinghua University.
Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus), a memory-speed virtual distributed storage system. Before founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds an MS from Cornell University and a BS from Peking University.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com