Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Spark ML optimization at Intel: A case study

Weisheng Xie (Orange Financial), Peng Meng (Intel)

1:50pm–2:30pm Wednesday, March 7, 2018

Data science and machine learning
Location: LL20 D

Average rating:

(5.00, 1 rating)

Who is this presentation for?

Data scientists

Prerequisite knowledge

A basic understanding of machine learning and operating system concepts

What you'll learn

Learn how Intel optimizes Spark ML

Description

Apache Spark ML has become hugely popular among data analytics in big data ecosystem and absorbed a great number of developers across the globe to actively contribute to the project. It has now evolved from being a standard ML library to a powerful component on Spark to support complex workflows and production requirements.

Intel has been deeply involved in Spark from its earliest moments, working with the community in feature development, bug fixing, and performance optimization. Vincent Xie and Peng Meng share what Intel has been working on with Spark ML and introduce the methodology behind Intel’s work on SparkML optimization—profile, analyze, and optimize.

At the profiling stage, Intel leverages HiBench to benchmark the target Spark ML algorithms on the dataset at different scales. With HiBench ML workloads, it’s easier to hit the bottleneck of the algorithm under test. Intel uses a set of tools, including Intel vTune, Intel PAT, and visualVM, to collect and analyze the performance data of the HiBench ML workloads with different metrics (CPU, memory, disk, network I/O, etc.). With such detailed performance data, it’s always likely to spot some opportunity to optimize the ML algorithms, either by software engineering or by leveraging HW supports. With this methodology, Intel has boosted the training process for logistic regression by ~1.7x, random forest and GBT by ~1.4x, and SVM by ~1.4x, and Intel saw a more than 60x performance boost for ALS on prediction. Vincent and Peng discuss those achievements and illustrate Intel’s three-stage working model on Spark optimization.

Weisheng Xie

Orange Financial

Vincent Xie (谢巍盛) is the Chief Data Scientist/Senior Director at Orange Financial, as head of the AI Lab, he built the Big Data & Artificial Intelligence team from scratch, successfully established the big data and AI infrastructure and landed tons of businesses on top, a thorough data-driven transformation strategy successfully boosts the company’s total revenue by many times. Previously, he worked at Intel for about 8 years, mainly on machine learning- and big data-related open source technologies and productions.

Website

Peng Meng

Intel

Peng Meng is a senior software engineer on the big data and cloud team at Intel, where he focuses on Spark and MLlib optimization. Peng is interested in machine learning algorithm optimization and large-scale data processing. He holds a PhD from the University of Science and Technology of China.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com