Apache Spark ML and MLlib are hugely popular in the big data ecosystem and have evolved from standard ML libraries to powerful components that support complex workflows and production requirements. Intel has been deeply involved in Spark from a very early stage, working with the community in feature development, bug fixing, and performance optimization.
Peng Meng outlines the methodology behind Intel’s work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib alternating least squares (ALS) by 60x in JD.com’s production environment. The methods include rewriting the code of recommendForAll, CartesianRDD compute optimization, choosing between f2jBLAS and NativeBLAS, the best settings for the cluster, and ALS parameters. This solution not only largely reduced the computation time on JD and VipShop production environment. It was also merged into Apache Spark.
Peng Meng is a senior software engineer on the big data and cloud team at Intel, where he focuses on Spark and MLlib optimization. Peng is interested in machine learning algorithm optimization and large-scale data processing. He holds a PhD from the University of Science and Technology of China.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com