Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Tuning Impala: The top five performance optimizations for the best BI and SQL analytics on Hadoop

Marcel Kornacker (Cloudera), Mostafa Mokhtar (Cloudera)
5:10pm5:50pm Wednesday, March 15, 2017
Enterprise adoption
Location: 230 A Level: Intermediate
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Architects and analysts

Prerequisite knowledge

  • A basic understanding of SQL analytics principles

What you'll learn

  • Learn best practices for design choices involving SQL analytics on Hadoop

Description

When it comes to SQL-on-Hadoop, it is easy to feel overwhelmed with the number of choices available in tools, file formats, schema design, and configurations. Making good design choices when you start is the best way to avoid some of the common pitfalls. Marcel Kornacker and Mostafa Mokhtar simplify the process and cover top performance optimizations for Apache Impala (incubating), from schema design and memory optimization to query tuning.

Topics include:

  • SQL-on-Hadoop: Pick your tool based on the workload and understanding where Hive, Impala, and Spark SQL are best used
  • Requirements and considerations for BI and SQL analytic workloads
  • Schema design
  • Memory usage, cluster size, and hardware recommendations
  • Multitenancy best practices
  • Query tuning basics for Impala
  • Impala performance and benchmarking
Photo of Marcel Kornacker

Marcel Kornacker

Cloudera

Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.

Photo of Mostafa Mokhtar

Mostafa Mokhtar

Cloudera

Mostafa Mokhtar is a performance engineer at Cloudera. Previously, he held similar roles at Hortonworks and on the SQL Server team at Microsoft.

Comments on this page are now closed.

Comments

Picture of Marcel Kornacker
Marcel Kornacker | TECH LEAD
03/29/2017 9:43am PDT

Hi Pankaj,

try running Compute Stats with the query option mt_dop set to the number of cores on your machines, this will enabled multi-threaded aggregation. If that uses too many resources, try lowering that value.

Pankaj Agrawal | SOLUTION ARCHITECT
03/15/2017 3:07pm PDT

Hello Marcel/Mostafa
Very insightful session. Thank you ! During the session you had mentioned about speeding up compute stats process in impala. Could you please provide more details around.