Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

A Data Marketplace Case Study with Blockchain and Advanced Multitentant Hadoop in Smart Open Data Platform

Minh Chau Nguyen (ETRI), Heesun Won (ETRI)
1:15pm–1:55pm Wednesday, 09/12/2018
Secondary topics:  Blockchain and decentralization, Data preparation, governance and privacy

Who is this presentation for?

Data scientist, Developer

Prerequisite knowledge

The attendees should have basic knowledge of Hadoop and Blockchain

What you'll learn

The attendees will learn about the overall architecture and the details how the platform works and then they can apply this knowledge for their projects in the furture.


Nowadays, there has been an increasing number of data exchange demands for research, analytic and collaboration in many areas. However, a large amount of data from the systems, services and users of different domains results in many problems in maintenance and operation management. If each data marketplace center separately builds and manages data itself, the availability, scalability and applicability of data analytic services, especially from authorized third parties, will be reduced and ineffective.

Our session will reveal the challenges in building our platform to meet the system requirements for data marketplace centers. We explain how the data owners, data scientists and developers can benefit from our platform based on data analytic results. Besides, we describe technical issues and implementation that allows securely sharing the data, and supports executing many analytics services with a flexibly controlled resource management mechanism over geographically distributed data centers. The main features of our platform for data marketplace centers can be summarized as follows:

- Blockchain and Taxonomy-based metadata/masterdata management for data classification, validation and quality controlling.
- Private key and advanced attribute based access control for improving authorization.
- Spark on YARN with file system isolation and network management support.
- Enhanced Multitenant scheduler for dynamically allocating, suspending and resuming data processing on selective nodes among data centers.

Photo of Minh Chau Nguyen

Minh Chau Nguyen


Minh Chau Nguyen is a researcher in the Big Data Software Platform Research department at the Electronic and Telecommunications Research Institute (ETRI), one of the largest government-funded research institutes in Korea. His research interests include big data management, software architecture, and distributed systems.

Photo of Heesun Won

Heesun Won


Hee Sun Won is a principal researcher at the Electronic and Telecommunications Research Institute (ETRI) and leads the Collaborative Analytics Platform for BDaaS (big data as a service) and analytics for the Network Management System (NFV/SDN/cloud). Her research interests include multitenant systems, cloud resource management, and big data analysis.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)