Every enterprise spends significant resources to protect its data. This is especially true in the case of Big Data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls, as well as the encryption of data at rest and in flight.
The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the Big Data application writing the data; it is not decrypted again until
it is accessed by another application. The data is encrypted during its entire lifespan — when in transit and at rest — except when it is being specifically accessed by a processing application.
TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does not come without its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration
isn’t easy to set up or maintain.
These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the Big Data compute cluster and a different Kerberos realm may be used to secure the HDFS file system accessed by this cluster.
BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. This session will provide a detailed description of how Transparent Data Encrpytion works with HDFS, with a particular focus on containerized environments. You will learn how HDFS TDE is configured and maintained in an environment
where many Big Data frameworks run simultaneously (e.g. in a hybrid cloud architecture using Docker containers). Moreover, you will learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with
the greatest flexibility in accessing data, while maintaining complete enterprise-grade data security.
Thomas Phelan is cofounder and chief architect of BlueData. Prior to BlueData, Tom was an early employee at VMware and as senior staff engineer was a key member of the ESX storage architecture team. During his 10-year stint at VMware, he designed and developed the ESX storage I/O load-balancing subsystem and modular “pluggable storage architecture.” He went on to lead teams working on many key storage initiatives, such as the cloud storage gateway and vFlash. Earlier, Tom was a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially available 64-bit file system.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org