Apache Impala is a complex engine and requires a thorough technical understanding to utilize it fully. Without proper configuration or usage, Impala’s performance becomes unpredictable, and end-user experience suffers. However, for many users and administrators, the right configuration of Impala is still a mystery.
Drawing on work with some of the largest clusters in the world, Manish Maheshwari shares ingestion best practices to keep an Impala deployment scalable and details admission control configuration to provide a consistent experience to end users. Manish also takes a high-level look at Impala’s query profile, which is used as a first step in any performance troubleshooting, and discusses common mistakes users and BI tools make when interacting with Impala. Manish concludes by detailing an ideal setup to show all of this in practice.
Manish Maheshwari is a data architect and data scientist at Cloudera. Manish has 13+ years of experience building extremely large data warehouses and analytical solutions. He’s worked extensively on Hadoop, DI and BI tools, data mining and forecasting, data modeling, master and metadata management, and dashboard tools and is proficient in Hadoop, SAS, R, Informatica, Teradata, and Qlikview. He participates in Kaggle Data Mining competitions as a hobby.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com