As Spark, MapReduce, and many other frameworks are being widely deployed at enterprise productions, an efficient and flexible compute and storage architecture often becomes a hot topic for debate among both IT and LOB practitioners. Although there are good reasons to run compute in a traditional hyperconverge environment as a part of a data lake implementation, decoupling storage and computation is becoming increasingly popular, as O’Reilly recently pointed out in a recent 2017 trend post. For example, Alluxio, IBM, Huawei, EMC, and Red Hat teams have come together to examine real-world application examples and provide joint solutions.
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors and considerations—application workload pattern, data locality, cost of infrastructure, network bandwidth, cloud deployment, etc.—and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform that addresses real-world business demand.
Calvin Jia is the release manager for Alluxio and is a core maintainer of the project. He is also the top contributor to the Alluxio project and one of its earliest contributors. Calvin holds a BS from the University of California, Berkeley.
Haoyuan Li is founder and CEO of Alluxio (formerly Tachyon Nexus), a memory-speed virtual distributed storage system. Before founding the company, Haoyuan was working on his PhD at UC Berkeley’s AMPLab, where he cocreated Alluxio. He is also a founding committer of Apache Spark. Previously, he worked at Conviva and Google. Haoyuan holds an MS from Cornell University and a BS from Peking University.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org