Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Authorization in the cloud: Enforcing access control across compute engines

Li Li (Google), Hao Hao (Cloudera)
2:55pm–3:35pm Thursday, 09/29/2016
Location: River Pavilion Level: Beginner
Tags: cloud
Average rating: ***..
(3.50, 2 ratings)

Prerequisite knowledge

  • A basic understanding of the Hadoop ecosystem and the cloud
  • General knowledge of processing engines such as HDFS, MapReduce, and Hive, and access control
  • What you'll learn

  • Understand the design and architecture of fine-grained authorization as a service for Hadoop in the cloud
  • Learn the basics of leveraging Apache Sentry and RecordService to provide authorization capabilities
  • Description

    Hadoop in the cloud is becoming an increasingly common use case, as the cloud provides rapid access to flexible and low-cost IT resources. Similar to traditional on-premises Hadoop clusters, data authorization becomes more crucial than ever for the multitenant cloud. A transparent solution that decouples compute and storage is required for a simple and smooth transition. And since the underlying data is shared across the components, a unified authorization policy should be enforced to adapt the flexibility of Hadoop ecosystem.

    Li Li and Hao Hao explore Apache Sentry and RecordService as a solution to address this problem. Apache Sentry is a framework to provide fine-grained authorization as a service, and RecordService is an abstraction layer between computing frameworks and data storage, which can leverage and enforce the Sentry centralized authorization policies.

    Li and Hao discuss the architecture of Apache Sentry and RecordService and how the fine-grained access control policies are uniformly enforced in different Hadoop components in the cloud, such as Hive, Solr, Impala, Kafka, Sqoop2, Spark, Pig, and MapReduce, with no performance loss. They also explain how Apache Sentry can leverage the benefits of both role-based access control (RBAC) and attribute-based access control (ABAC).

    Photo of Li Li

    Li Li


    Li Li is a software engineer on Google’s Cloud team. Previously, Li worked at Cloudera on RecordService and Apache Sentry projects. She is also a committer and PMC of the Apache Sentry (TLP) project. Li holds a master’s degree in computer science from Vanderbilt University.

    Photo of Hao Hao

    Hao Hao


    Hao Hao is a software engineer at Cloudera currently working on the Apache Sentry project, a granular, role-based authorization module for the Hadoop cluster. She is also a PMC of the Apache Sentry (TLP) project. Hao performed extensive research on smartphone security and web security while she was a PhD student at Syracuse University. Prior to joining Cloudera, Hao worked on eBay’s Search Backend team building search infrastructure for eBay’s online buying platform.