Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

BI and SQL analytics with Hadoop in the cloud

Henry Robinson (Cloudera), Alex Gutow (Cloudera)
1:50pm2:30pm Wednesday, March 15, 2017
Big data and the Cloud
Location: 210 A/E Level: Intermediate
Secondary topics:  Architecture, Cloud

Who is this presentation for?

  • Architects and analysts

Prerequisite knowledge

  • A basic understanding of SQL and cloud principles

What you'll learn

  • Learn best practices for deploying Hadoop-based BI and SQL analytics in the cloud


Today, Hadoop is deployed on-premises and in the public cloud, with public cloud becoming increasingly more prevalent. The cloud provides some unique abilities, including on-demand infrastructure, cluster elasticity, persisted globally available object storage, and pay-for-use pricing, which enables even more flexible and cost-efficient deployment options for BI and SQL analytic users of Impala but brings in new challenges that need to be carefully considered to achieve optimal outcome.

Henry Robinson and Alex Gutow explain how to best take advantage of the flexibility and cost-effectiveness of the cloud with your BI and SQL analytic workloads using Apache Hadoop and Apache Impala (incubating) to provide the same great functionality, partner ecosystem, and flexibility of on-premises deployments. Henry and Alex cover the architectural considerations, best practices, tuning, and functionality available when deploying or migrating BI and SQL analytic workloads to the cloud.

Topics include:

  • On-premises versus the cloud: What’s the same and what’s different
  • What kind of instance types to consider for best performance
  • How Impala can read/write on cloud-based object storage (S3)
  • How to understand workloads in terms transient, hybrid, and permanent cloud clusters and what workloads are cost effective to run on the cloud versus on-premises
  • Tuning and best practices for storage and instance choices that will help you effectively architect Impala clusters
  • BI and SQL analytics with Hadoop in the cloud
  • How Impala performs in the cloud compared to alternatives
  • The roadmap for what’s next
Photo of Henry Robinson

Henry Robinson


Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.

Photo of Alex Gutow

Alex Gutow


Alex Gutow is senior product marketing manager at Cloudera, where she focuses on the analytic database platform solution and technologies. Previously, she managed technical marketing and PR for Basho Technologies and managed consumer and enterprise marketing for Truaxis, a Mastercard company. Alex holds a BS in marketing and a BA in psychology from Carnegie Mellon University.