Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

How to protect big data in a containerized environment

Thomas Phelan (HPE BlueData)
5:10pm5:50pm Wednesday, March 7, 2018

Who is this presentation for?

  • Chief security officers, enterprise security IT professionals, and data scientists

Prerequisite knowledge

  • A basic understanding of HDFS, TDE, and Kerberos (useful but not required)

What you'll learn

  • Understand why it's necessary to protect and encrypt big data in the enterprise and common methods of protecting big data (including data at rest and in motion)
  • Learn how Kerberos credentials are used to protect access to data in an HDFS filesystem configured with Transparent Data Encryption (TDE) and why Kerberos cross-realm trust is useful in accessing and protecting data stored in an HDFS filesystem secured with TDE in the enterprise


Every enterprise spends significant resources to protect its data. This is especially true in the case of big data, since some of this data may include sensitive or confidential customer and financial information. Common methods for protecting data include permissions and access controls as well as the encryption of data at rest and in flight.

The Hadoop community has recently rolled out Transparent Data Encryption (TDE) support in HDFS. Transparent Data Encryption refers to the process whereby data is transparently encrypted by the big data application writing the data; it is not decrypted again until it is accessed by another application. The data is encrypted during its entire lifespan—in transit and at rest—except when it is being specifically accessed by a processing application.

TDE is an excellent approach for protecting data stored in data lakes built on the latest versions of HDFS. However, it does have its challenges and limitations. Systems that want to use TDE require tight integration with enterprise-wide Kerberos Key Distribution Center (KDC) services and Key Management Systems (KMS). This integration isn’t easy to set up or maintain. These issues can be even more challenging in a virtualized or containerized environment where one Kerberos realm may be used to secure the big data compute cluster and a different Kerberos realm may be used to secure the HDFS filesystem accessed by this cluster.

BlueData has developed significant expertise in configuring, managing, and optimizing access to TDE-protected HDFS. Thomas Phelan offers a detailed overview of how transparent data encryption works with HDFS, with a particular focus on containerized environments. You’ll learn how HDFS TDE is configured and maintained in an environment where many big data frameworks run simultaneously (e.g., in a hybrid cloud architecture using Docker containers). Moreover, you’ll learn how KDC credentials can be managed in a Kerberos cross-realm environment to provide data scientists and analysts with the greatest flexibility in accessing data while maintaining complete enterprise-grade data security.

Photo of Thomas Phelan

Thomas Phelan

HPE BlueData

Thomas Phelan is cofounder and chief architect of BlueData. Previously, a member of the original team at Silicon Graphics that designed and implemented XFS, the first commercially availably 64-bit file system; and an early employee at VMware, a senior staff engineer and a key member of the ESX storage architecture team where he designed and developed the ESX storage I/O load-balancing subsystem and modular pluggable storage architecture as well as led teams working on many key storage initiatives such as the cloud storage gateway and vFlash.