Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

How to use Impala's query plan and profile to fix performance issues

Juan Yu (Cloudera)
1:30pm5:00pm Tuesday, March 6, 2018
Data engineering and architecture
Location: LL21 A Level: Intermediate
Average rating: ****.
(4.75, 4 ratings)

Who is this presentation for?

  • IT or business users

Prerequisite knowledge

  • A basic understanding of the Impala query engine and distributed system

Materials or downloads needed in advance

What you'll learn

  • Understand the cost of Impala query execution
  • Learn how to identify bottlenecks via Impala's query plan and profile and fix them

Description

Apache Impala (incubating) is an exceptional, best-of-breed massively parallel processing SQL query engine that is a fundamental component of the big data software stack. However, Impala is a complex engine and requires a thorough technical understanding to utilize it fully. When Impala is improperly configured or used, it may use too many resources, and performance could be very poor.

For many users, understanding Impala query performance is like a trip on the mystery bus. Impala provides a query plan and query profile to help users choose an optimal plan and understand how a query is executed and how many resources it uses. But digging through query profiles isn’t fun for everyone. Juan Yu demystifies the cost model Impala Planner uses and how Impala optimizes queries and explains how to identify performance bottleneck through query plan and profile and how to drive Impala to its full potential.

Photo of Juan Yu

Juan Yu

Cloudera

Juan Yu is a software engineer at Cloudera working on the Impala project, where she helps customers investigate, troubleshoot, and resolve escalations and analyzes performance issues to identify bottlenecks, failure points, and security holes. Juan also implements enhancements in Impala to improve customer experience. Previously, Juan was a software engineer at Interactive Intelligence and held developer positions at Bluestreak, Gameloft, and Engenuity.

Comments on this page are now closed.

Comments

Picture of Juan Yu
Juan Yu | SOFTWARE ENGINEER
03/08/2018 10:06am PST

Thanks to everyone who joined the tutorial.
During the tutorial I showed a Cloudera Manager dashboard that can help monitor Impala load, identify system bottleneck. Many people asked for it.
I uploaded the dashboard json file to github repo. You can download it from there then import it into your CM.

Picture of Juan Yu
Juan Yu | SOFTWARE ENGINEER
03/05/2018 5:57am PST

Support more users is more a scalability issue. I’d love to talk about it but it’s out of the scope of this tutorial.
I host a braindate session on Wed, Mar 7, 3:00 PM “How to scale Impala”. You’re welcome to join that session.

Gloria Appelgren | SENIOR ENTERPRISE ARCHITECT
03/05/2018 1:29am PST

What are the solution alternatives when I have more than 50 users making queries with Impala?