Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

BI and SQL analytics with Hadoop in the cloud

Alex Gutow (Cloudera), Henry Robinson (Cloudera)
2:35pm–3:15pm Thursday, December 8, 2016
Production-ready Hadoop
Location: Summit 2 Level: Intermediate
Average rating: ***..
(3.50, 2 ratings)

Prerequisite Knowledge

  • A basic understanding of SQL and cloud principles

What you'll learn

  • Understand the benefits and trade-offs of doing SQL analytics in the cloud


Today, Apache Hadoop is deployed both on-premises and in the public cloud with the public cloud increasingly becoming more prevalent. The cloud provides some unique abilities, such as on-demand infrastructure, cluster elasticity, persisted globally available object storage, and pay-for-use pricing. This enables even more flexible and cost-efficient deployment options for BI and SQL analytic users of Impala but brings in some new challenges that need to be carefully considered to achieve optimal outcome.

Alex Gutow and Henry Robinson explain how Apache Hadoop and Apache Impala (incubating) take advantage of the benefits of the cloud to provide the same great functionality, partner ecosystem, and flexibility of on-premises deployments combined with the flexibility and cost efficiency of the cloud.

Topics include:

  • On-premises versus the cloud: What’s the same and what’s different
  • Instance types to consider for best performance
  • How Impala can read/write on cloud-based object storage (S3)
  • How to understand workloads in transient, hybrid, and permanent cloud clusters and what workloads are cost effective to run on the cloud versus on-premises
  • Tuning and best practices for storage and instance choices that will help you effectively architect Impala clusters
  • BI and SQL analytics with Hadoop in the cloud
  • How Impala performs in the cloud compared to alternatives
  • Roadmap and what’s next
Photo of Alex Gutow

Alex Gutow


Alex Gutow is senior product marketing manager at Cloudera, where she focuses on the analytic database platform solution and technologies. Previously, she managed technical marketing and PR for Basho Technologies and managed consumer and enterprise marketing for Truaxis, a Mastercard company. Alex holds a BS in marketing and a BA in psychology from Carnegie Mellon University.

Photo of Henry Robinson

Henry Robinson


Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.