Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Scaling SQL-on-Hadoop for BI

Yanpei Chen (Cloudera), Dileep Kumar (Cloudera Inc)
14:35–15:15 Wednesday, 6/05/2015
Hadoop Platform
Location: King's Suite - Sandringham
Average rating: ****.
(4.50, 2 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

Basic knowledge of the Hadoop ecosystem.

Description

SQL-on-Hadoop systems that support business intelligence (BI) use cases must be able to handle hundreds or even thousands of concurrent users. Existing SQL-on-Hadoop trade press focuses on single-user performance. This leaves a critical knowledge gap, as organizations consider using SQL-on-Hadoop to support BI applications involving a large number of concurrent users.

This talk discusses how you can scale your SQL-on-Hadoop system to a large number of concurrent users, and how to verify that your SQL-on-Hadoop system can support the current and future BI load. Specifically, we will talk about:

- The impact of scaling cluster size and hardware
- How to translate between cluster throughput and supported users
- Growing your cluster with your data
- How to verify performance before you go “in production”

You will never look at SQL-on-Hadoop performance the same way after this talk.

Photo of Yanpei Chen

Yanpei Chen

Cloudera

Yanpei Chen is a software engineer at Cloudera, working on the Performance Engineering team. He regularly participates in competitive performance “bake-offs” that directly drive customer purchasing decisions. His work touches upon Cloudera Search, Impala, Apache Hadoop, Apache HBase, and Apache Hive, because someone has to make sure the entire Hadoop ecosystem performs well together. Yanpei is a frequent speaker at industry and academia conferences, and contributes to various industry standard benchmarks for big data.

Photo of Dileep Kumar

Dileep Kumar

Cloudera Inc

Dileep Kumar works in the Performance Engineering team at Cloudera. He holds an M.S. from Santa Clara University and has more than 15 years of experience in performance engineering for SQL systems. He is a regular author and presenter.