San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Spark adaptive execution: Unleash the power of Spark SQL

Haifeng Chen (Intel)

1:50pm–2:30pm Thursday, March 28, 2019

Data Engineering & Architecture
Location: 2004

Secondary topics: Streaming, realtime analytics, and IoT

Average rating:

(4.00, 3 ratings)

Download slides (PPT)

Who is this presentation for?

Big data architects and engineers, data scientists, system architects, and project and product managers

Level

Intermediate

Prerequisite knowledge

Familiarity with MapReduce and Spark, including Spark SQL and its core architecture

What you'll learn

Explore an adaptive execution engine that addresses stability, performance, and usability challenges typically faced when applying Spark SQL to highly dynamic business environments

Description

Spark SQL—the most popular component of Apache Spark—is widely used to process large-scale structured data in the data center. However, it still suffers from stability and performance challenges in highly dynamic environments with ultra-large-scale data.

Haifeng Chen shares a Spark adaptive execution engine built to address these challenges. It can handle task parallelism, join conversion, and data skew dynamically during runtime, guaranteeing the best plan is chosen using runtime statistics, and has provided significant performance improvements in typical SQL benchmarks like TPC-DS. The performance of this approach has been proven by its adoption by a number of Chinese internet companies.

Haifeng details the major three challenges the industry faces when using Spark SQL in real-world environments, outlines the technical architecture of the adaptive execution approach as well as the technical details of each solution designed to solve these challenges, and shares benchmark results and experiences from industrial adoptions. Haifeng concludes by discussing planned advances to optimize the Spark Adaptive Execution engine and take it to the next level.

Haifeng Chen

Intel

Haifeng Chen is a senior software architect at Intel’s Asia Pacific R&D Center. He has more than 12 years’ experience in software design and development, big data, and security, with a particular interest in image processing. Haifeng is the author of image browsing, editing, and processing software ColorStorm.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com