Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Transparent encryption in HDFS: The missing piece in big data security

Andrew Wang (Cloudera)
2:55pm–3:35pm Thursday, 10/01/2015
Security & Governance
Location: 3D 04/09 Level: Intermediate
Average rating: ****.
(4.10, 10 ratings)

Data encryption is a requirement for many business sectors dealing with confidential information, such as finance, healthcare, and government. For example, HIPAA, FISMA, and DCI all require that data is encrypted while it is in-flight (being transferred over the network), and when it is at-rest (stored durably on disk). There can also be additional restrictions surrounding access, management, and storage of encryption keys.

To meet these requirements, transparent, end-to-end encryption was added to HDFS. Once configured, data read from and written to certain HDFS directories is transparently encrypted and decrypted without requiring any changes to user application code. This encryption is also end-to-end, meaning that data is protected both in-flight and at-rest, and can only be encrypted and decrypted by the client. This improves security since HDFS itself never handles unencrypted data or data encryption keys. Furthermore, through the use of a new cluster service, the Hadoop Key Management Server (KMS), the responsibilities of key administration and HDFS administration can be separated, further enhancing security.

During this talk, we will cover the design, implementation, and usage of transparent encryption in HDFS.

Photo of Andrew Wang

Andrew Wang


Andrew Wang is a software engineer on the HDFS team at Cloudera. Previously, he was a graduate student in the AMPLab at the University of California, Berkeley, advised by Ion Stoica, where he worked on research related to in-memory caching and quality of service. In his spare time, he enjoys going on bike rides, cooking, and playing guitar.