Data encryption is a requirement for many business sectors dealing with confidential information, such as finance, healthcare, and government. For example, HIPAA, FISMA, and DCI all require that data is encrypted while it is in-flight (being transferred over the network) and when it is at-rest (stored durably on disk). There can also be additional restrictions surrounding access, management, and storage of encryption keys.
To meet these requirements, transparent, end-to-end encryption was added to HDFS. Once configured, data read from and written to certain HDFS directories is transparently encrypted and decrypted without requiring any changes to user application code. This encryption is also end-to-end, meaning that data is protected both in-flight and at-rest, and can only be encrypted and decrypted by the client. This improves security since HDFS itself never handles unencrypted data or data encryption keys. Furthermore, through the use of a new cluster service, the Hadoop Key Management Server (KMS), the responsibilities of key administration and HDFS administration can be separated, further enhancing security.
During this talk, we will cover the design, implementation, and usage of transparent encryption in HDFS. We will also cover performance results demonstrating the benefits of hardware crypto acceleration (AES-NI).
Software Engineer with 30+ years of experience developing DBMS software. S.M., S.B. Computer Science, MIT.
Andrew is a software engineer on the HDFS team at Cloudera. Previously, he was a graduate student in the AMPLab at the University of California, Berkeley advised by Prof. Ion Stoica, where he worked on research related to in-memory caching and quality-of-service. In his spare time he enjoys going on bike rides, cooking, and playing guitar.
Comments on this page are now closed.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.