Memory is the key to fast big data processing. This has been realized by many, and frameworks such as Spark and Shark already leverage memory performance. As data sets continue to grow, storage is increasingly becoming a critical bottleneck in many workloads.
To address this need, we have developed Tachyon, a memory-centric fault-tolerant distributed storage system, which enables reliable file sharing at memory-speed across cluster frameworks such as Spark and MapReduce. The result of over three years of research and development, Tachyon achieves both memory-speed and fault tolerance.
Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code changes. Tachyon is the default off-heap option in Spark, which means that RDDs can automatically be stored inside Tachyon to make Spark more resilient and avoid GC overheads. The project is open source and is already deployed at multiple companies. In addition, Tachyon has more than 80 contributors from over 30 institutions, including Yahoo, Tachyon Nexus, Redhat, Nokia, Intel, and Databricks. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.
In this talk, we introduce Tachyon. We will present its architecture, performance evaluation, as well as several use cases we have seen in the real world.
Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus), a memory-speed virtual distributed storage system. Before founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds an MS from Cornell University and a BS from Peking University.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.