Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Multitenant Hadoop across geographically distributed data centers

Heesun Won (ETRI), Minh Chau Nguyen (ETRI)
1:30pm–2:10pm Wednesday, 12/02/2015
Hadoop Platform
Location: 334-335 Level: Intermediate
Average rating: ***..
(3.50, 4 ratings)

Prerequisite Knowledge

The attendees should have basic knowledge of YARN and HDFS.

Description

Nowadays, there has been an increasing number of data centers around the world for various data services. If each data center separately builds and maintains Hadoop for analytics services, the data processing cooperation among such multiple, isolated Hadoops will be complicated and ineffective. It is useful to build a single Hadoop cluster with supporting secure access control and fair resource management across distributed data centers.

This session addresses the multi-tenant feature of geographically-distributed Hadoop. We describe technical issues and implementations that allow multiple users (data owners, service developers, data scientists, and end users) to securely share the data and support a flexibly controlled resource management mechanism, which results in isolated executing environments for each tenant over geographically-distributed data centers.

The main features of our platform are:

  • Extended Hadoop architecture-based metadatabase for improving multitenancy
  • File system isolation for resource management
  • Namenode independence access
  • Multitenancy scheduler for data processing on selective nodes among data centers
  • Kerberos, Knox security improvement with metadatabase
  • Selective data replication among data centers

The session will reveal the main requirements and challenges in building our platform, explain details of the design, and compare it with existing approaches.

Photo of Heesun Won

Heesun Won

ETRI

Heesun Won is a principal researcher at the Electronic and Telecommunications Research Institute (ETRI), where she has been developing an open data reference model and data distribution system with semantic data map—SODAS: Smart Open Data as a System. Her research interests include software architecture for big data processing in cloud environments.

Photo of Minh Chau Nguyen

Minh Chau Nguyen

ETRI

Minh Chau Nguyen is a researcher in the smart data platform research department at the Electronic and Telecommunications Research Institute (ETRI). His research interests include big data management, software architecture, and distributed systems.