Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Rethinking data marts in the cloud: Common architectural patterns for analytics

Henry Robinson (Cloudera), Greg Rahn (Cloudera)
2:05pm2:45pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Architecture, Cloud

Who is this presentation for?

  • Architects and those in IT

Prerequisite knowledge

  • A basic understanding of SQL and cloud principles

What you'll learn

  • Explore the common cloud architectural patterns to optimize price and performance
  • Learn the trade-offs when running analytics in cloud environments


Cloud environments will likely play a key role in your business’s future. With the allure of on-demand provisioning and usage-based cost optimizations, it’s no surprise why. However, to maximize the full potential of the cloud, it’s critical to understand how to best leverage these environments for different workloads without disrupting the business.

For migrating data marts and analytics to the cloud, you’ll need to understand when it’s best to use object storage versus local storage, how to design for multitenant isolation, and how to tune performance for SLAs, among others. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud and discuss the common architectural patterns to optimize price and performance. You’ll learn how to incorporate the cloud into your overall infrastructure landscape and the benefits of a heterogenous strategy.

Topics include:

  • When to use transient clusters versus long-lived clusters
  • The trade-offs between object stores and locally attached storage
  • Architectures for multitenancy
  • Translating enterprise security and governance to cloud environments
Photo of Henry Robinson

Henry Robinson


Henry Robinson is a software engineer at Cloudera. For the past few years, he has worked on Apache Impala, an SQL query engine for data stored in Apache Hadoop, and leads the scalability effort to bring Impala to clusters of thousands of nodes. Henry’s main interest is in distributed systems. He is a PMC member for the Apache ZooKeeper, Apache Flume, and Apache Impala open source projects.

Photo of Greg Rahn

Greg Rahn


Greg Rahn is director of product management at Cloudera, where he’s responsible for driving SQL product strategy as part of the company’s data warehouse product team, including working directly with Impala. For over 20 years, Greg has worked with relational database systems in a variety of roles, including software engineering, database administration, database performance engineering, and most recently product management, providing a holistic view and expertise on the database market. Previously, Greg was part of the esteemed Real-World Performance Group at Oracle and was the first member of the product management team at Snowflake Computing.