As Spark, MapReduce, and many other frameworks are being widely deployed at enterprise productions, an efficient and flexible compute and storage architecture often becomes a hot topic for debate among both IT and LOB practitioners. Although there are good reasons to run compute in a traditional hyperconverge environment as a part of a data lake implementation, decoupling storage and computation is becoming increasingly popular, as O’Reilly recently pointed out in a recent 2017 trend post. For example, Alluxio, IBM, Huawei, EMC, and Red Hat teams have come together to examine real-world application examples and provide joint solutions.
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors and considerations—application workload pattern, data locality, cost of infrastructure, network bandwidth, cloud deployment, etc.—and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform that addresses real-world business demand.
Calvin Jia is the release manager for Alluxio and is a core maintainer of the project. He is also the top contributor to the Alluxio project and one of its earliest contributors. Calvin holds a BS from the University of California, Berkeley.
Haoyuan (H.Y.) Li is the founder, chairman, and CTO of Alluxio. He holds a PhD in computer science from UC Berkeley’s AMPLab, where he created the Alluxio (formerly Tachyon) open source data orchestration system, cocreated Apache Spark Streaming, and became an Apache Spark founding committer. He also holds an MS from Cornell University and a BS from Peking University, both in computer science.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com