Weld is a new open source project from Stanford that accelerates data-intensive applications by as much as 100x by optimizing across functions within a single library as well as across different libraries, so developers can write modular code and still get close to bare-metal performance without incurring expensive data-movement costs. Weld uses a common representation to capture the structure of data-parallel workloads such as SQL, machine learning, and graph analytics and then optimizes across them using a cost-based optimizer that takes into account hardware characteristics. Weld contains APIs in Python and C and can be integrated it into a variety of widely used analytics frameworks such as Spark SQL, TensorFlow, and pandas.
Shoumik Palkar and Matei Zaharia offer an overview of Weld’s architecture and internals and demonstrate how Weld can be incrementally integrated into these libraries by porting only the most impactful operators first without breaking compatibility with other operators in the library and without changing the API of the libraries (so users do not need to change their application code). Shoumik and Matei also explain how Weld speeds up existing workloads using libraries such as pandas, NumPy, and Spark SQL by up to 300x when workloads combine these libraries and explore Grizzly, a Weld-integrated version of the pandas framework.
Shoumik Palkar is a second-year PhD student in the Infolab at Stanford University, where he works with Matei Zaharia on high-performance data analytics. He holds a degree in electrical engineering and computer science from UC Berkeley.
Matei Zaharia is an assistant professor in the Computer Science Department at Stanford, where he works on computer systems and big data.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org