Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Apache Spark ML and MLlib tuning and optimization: A case study on boosting the performance of ALS by 60x

Peng Meng (Intel)
5:05pm5:45pm Wednesday, December 6, 2017
Average rating: *....
(1.00, 1 rating)

Who is this presentation for?

  • Software engineers

Prerequisite knowledge

  • A basic understanding of big data and Spark

What you'll learn

  • Explore Spark MLlib tuning and optimization methods and a case study on ALS optimization


Apache Spark ML and MLlib are hugely popular in the big data ecosystem and have evolved from standard ML libraries to powerful components that support complex workflows and production requirements. Intel has been deeply involved in Spark from a very early stage, working with the community in feature development, bug fixing, and performance optimization.

Peng Meng outlines the methodology behind Intel’s work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib alternating least squares (ALS) by 60x in’s production environment. The methods include rewriting the code of recommendForAll, CartesianRDD compute optimization, choosing between f2jBLAS and NativeBLAS, the best settings for the cluster, and ALS parameters. This solution not only largely reduced the computation time on JD and VipShop production environment. It was also merged into Apache Spark.

Photo of Peng Meng

Peng Meng


Peng Meng is a senior software engineer on the big data and cloud team at Intel, where he focuses on Spark and MLlib optimization. Peng is interested in machine learning algorithm optimization and large-scale data processing. He holds a PhD from the University of Science and Technology of China.