SQL-on-Hadoop systems that support business intelligence (BI) use cases must be able to handle hundreds or even thousands of concurrent users. Existing SQL-on-Hadoop trade press focuses on single-user performance. This leaves a critical knowledge gap, as organizations consider using SQL-on-Hadoop to support BI applications involving a large number of concurrent users.
This talk discusses how you can scale your SQL-on-Hadoop system to a large number of concurrent users, and how to verify that your SQL-on-Hadoop system can support the current and future BI load. Specifically, we will talk about:
- The impact of scaling cluster size and hardware
- How to translate between cluster throughput and supported users
- Growing your cluster with your data
- How to verify performance before you go “in production”
You will never look at SQL-on-Hadoop performance the same way after this talk.
Yanpei Chen is a software engineer at Cloudera, working on the Performance Engineering team. He regularly participates in competitive performance “bake-offs” that directly drive customer purchasing decisions. His work touches upon Cloudera Search, Impala, Apache Hadoop, Apache HBase, and Apache Hive, because someone has to make sure the entire Hadoop ecosystem performs well together. Yanpei is a frequent speaker at industry and academia conferences, and contributes to various industry standard benchmarks for big data.
Dileep Kumar works in the Performance Engineering team at Cloudera. He holds an M.S. from Santa Clara University and has more than 15 years of experience in performance engineering for SQL systems. He is a regular author and presenter.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.