Developing and Deploying Hadoop Security

Data: Hadoop
Location: C124
Average rating: **...
(2.25, 4 ratings)

Hadoop 0.20 implicitly trusts the user when they state their username and group membership. That is acceptable when used by small teams, but large corporations need more control. For example, large corporations need to have independent clusters for each different kind of sensitive information (financial, personal identifiable information, etc.) and control access by limiting access to those clusters. With Hadoop’s new security features and its integration with Kerberos, it is possible to verify that the user is who they claim to be and ensure they only have the correct access to data or resources. This allows corporations to allow finer grained access to information and reduce their operational overhead by coalescing their distinct clusters. This presentation will cover the goals of security and how to use the new features to ensure the security of their HDFS and MapReduce clusters. I will also include Yahoo’s experiences deploying the back-ported Hadoop Security features on their science and production clusters.

Photo of Owen O'Malley

Owen O'Malley


Owen O’Malley is a cofounder and technical fellow at Cloudera, formerly Hortonworks. Cloudera’s software includes Hadoop and the large ecosystem of big data tools that enterprises need for their data analytics. Owen has been working on Hadoop since the beginning of 2006 at Yahoo, was the first committer added to the project, and used Hadoop to set the Gray sort benchmark in 2008 and 2009. Previously, he was the architect of MapReduce, Security, and now Hive. He’s driving the development of the ORC file format and adding ACID transactions to Hive.

Comments on this page are now closed.


Picture of Sheeri K. Cabral
Sheeri K. Cabral
09/07/2011 4:17am PDT

A video of this presentation is online at