Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Unified metadata management for scalability, integrity, and reliability across geographically distributed data centers

Minh Chau Nguyen (ETRI), Heesun Won (ETRI)
12:05pm–12:45pm Thursday, December 8, 2016
Hadoop internals & development
Location: 323 Level: Intermediate
Average rating: ****.
(4.00, 2 ratings)

Prerequisite Knowledge

  • Basic knowledge of YARN, HDFS, and Spark

What you'll learn

  • Explore a unified metadatabase and learn how to apply it to your current big data projects


The number of data centers around the world for various data services is increasing, and the large amount of metadata from systems, services, and users is resulting in many problems for maintenance and operation management. If each system in each data center separately builds and manages metadata itself, the availability, scalability, and applicability of data services, especially from authorized third parties, is reduced and ineffective, so it is useful to build a unified metadatabase with supporting secure access control and system-reflection management across distributed data centers.

Minh Chau Nguyen and Hee Sun Won explore the integrated metadata management feature of the geographically distributed Hadoop ecosystem and describe an implementation that allows multiple users to securely access the metadata and supports reflecting changes in runtime to specific systems with a flexible schema management mechanism over geographically distributed data centers. Along the way, Minh and Hee Sun reveal the main requirements and challenges in building this platform, explain details of the design, and compare it with existing approaches.

The main features of this unified metadatabase are as follows:

  • Unified metadata map creation, modification, and visualization
  • Enhanced cluster, service, and tenant configuration support
  • Runtime metadata extraction and reflection from/to the Hadoop ecosystem
  • Advanced Open API service for metadata utilization of authorized third parties
  • Pattern-based prefetching mechanism for performance improvements
  • Specialized metadata structure with on-the-fly encryption feature for secure high-performance access
Photo of Minh Chau Nguyen

Minh Chau Nguyen


Minh Chau Nguyen is a researcher in the smart data platform research department at the Electronic and Telecommunications Research Institute (ETRI). His research interests include big data management, software architecture, and distributed systems.

Photo of Heesun Won

Heesun Won


Heesun Won is a principal researcher at the Electronic and Telecommunications Research Institute (ETRI), where she has been developing an open data reference model and data distribution system with semantic data map—SODAS: Smart Open Data as a System. Her research interests include software architecture for big data processing in cloud environments.