San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Scaling Impala: Common mistakes and best practices

Manish Maheshwari (Cloudera)

11:15–11:55 Thursday, 2 May 2019

Data Engineering and Architecture
Location: S11 A

Average rating:

(5.00, 1 rating)

Download slides (PDF)

Who is this presentation for?

Administrators, data engineers working with Impala, and developers of BI tools on top of Impala

Level

Intermediate

Prerequisite knowledge

A basic understanding of a modern MPP database (Impala or similar)

What you'll learn

Learn how to optimally set up and configure Impala for large-scale deployments
Explore methods to ensure consistent performance at scale, get started with query profiles, and identify bottlenecks

Description

Apache Impala is a complex engine and requires a thorough technical understanding to utilize it fully. Without proper configuration or usage, Impala’s performance becomes unpredictable, and end-user experience suffers. However, for many users and administrators, the right configuration of Impala is still a mystery.

Drawing on work with some of the largest clusters in the world, Manish Maheshwari shares ingestion best practices to keep an Impala deployment scalable and details admission control configuration to provide a consistent experience to end users. Manish also takes a high-level look at Impala’s query profile, which is used as a first step in any performance troubleshooting, and discusses common mistakes users and BI tools make when interacting with Impala. Manish concludes by detailing an ideal setup to show all of this in practice.

Manish Maheshwari

Cloudera

Manish Maheshwari is a data architect and data scientist at Cloudera. Manish has 13+ years of experience building extremely large data warehouses and analytical solutions. He’s worked extensively on Hadoop, DI and BI tools, data mining and forecasting, data modeling, master and metadata management, and dashboard tools and is proficient in Hadoop, SAS, R, Informatica, Teradata, and Qlikview. He participates in Kaggle Data Mining competitions as a hobby.

Website

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com