Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Authorization in the cloud: Enforcing access control across compute engines

Hao Hao (Cloudera), Alex Leblang (Cloudera)
11:15am–11:55am Thursday, December 8, 2016
Security & governance
Location: 321/322 Level: Beginner
Tags: cloud
Average rating: ***..
(3.00, 2 ratings)

Prerequisite Knowledge

  • A basic understanding of the Hadoop ecosystem and the cloud
  • Knowledge of processing engines (HDFS, MapReduce, Hive, etc.) and access control

What you'll learn

  • Understand how to architect a fine-grained authorization solution for Hadoop in the cloud using Apache Sentry and RecordService


Using Hadoop in the cloud is an increasingly common use case with the cloud providing rapid access to flexible and cheap IT resources. As is the case with traditional on-premises Hadoop clusters, data authorization is crucial for a multitenant cloud. In addition, a transparent solution that decouples compute and storage is required for a simple, smooth experience. Since the underlying data is shared across the components, unified authorization policies must also be enforced across all components to produce a modern and flexible Hadoop ecosystem.

Hao Hao and Alex Leblang explore using Apache Sentry, a framework to provide fine-grained authorization as a service, together with RecordService, an abstraction layer between computing frameworks and data storage, which can leverage and enforce the Sentry centralized authorization policies, as a solution to this problem. They discuss the architecture of Apache Sentry and RecordService and how the fine-grained access control policies are uniformly enforced in different Hadoop components in the cloud with no performance loss—specifically looking at Hive, Solr, Impala, Kafka, Sqoop2, Spark, Pig, and MapReduce. Along the way, Hao and Alex also explain how Apache Sentry can leverage the benefits of both role-based access control (RBAC) and attribute-based access control (ABAC).

Photo of Hao Hao

Hao Hao


Hao Hao is a software engineer at Cloudera currently working on the Apache Sentry project, a granular, role-based authorization module for the Hadoop cluster. She is also a PMC of the Apache Sentry (TLP) project. Hao performed extensive research on smartphone security and web security while she was a PhD student at Syracuse University. Prior to joining Cloudera, Hao worked on eBay’s Search Backend team building search infrastructure for eBay’s online buying platform.

Photo of Alex Leblang

Alex Leblang


Alex Leblang is an engineer at Cloudera on the RecordService team. Previously, Alex was an Apache Impala (incubating) engineer and interned at Vertica. He holds a bachelor’s degree from Brown University with concentrations in computer science and Latin American studies.