Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

A data marketplace case study with the blockchain and advanced multitenant Hadoop in a smart open data platform

Minh Chau Nguyen (ETRI), Heesun Won (ETRI)
1:15pm–1:55pm Wednesday, 09/12/2018
Secondary topics:  Blockchain and decentralization, Data preparation, governance and privacy
Average rating: **...
(2.20, 5 ratings)

Who is this presentation for?

  • Data scientists and developers

Prerequisite knowledge

  • Basic knowledge of Hadoop and blockchain technology

What you'll learn

  • Learn how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers, using the blockchain

Description

Nowadays, there has been an increasing number of data exchange demands for research, analytic and collaboration in many areas. However, the large amount of data from the systems, services, and users of different domains results in many problems in maintenance and operation management. If each data marketplace center separately builds and manages data itself, the availability, scalability, and applicability of data analytic services, especially from authorized third parties, will be reduced and rendered ineffective.

Minh Chau Nguyen and Heesun Won explain how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers. The solution extends the overall architecture of the Hadoop ecosystem with the blockchain so that multiple tenants and authorized third parties can securely access data while still maintaining privacy, scalability, and reliability. The platform includes blockchain- and taxonomy-based metadata and master data management for data classification, validation, and quality controlling; private key and advanced attribute-based access control for improving authorization; Spark on YARN with filesystem isolation and network management support; and an enhanced multitenant scheduler for dynamically allocating, suspending, and resuming data processing on selective nodes among data centers.

Minh Chau and Heesun share the challenges in building the platform to meet the system requirements for data marketplace centers as well as how data owners, data scientists, and developers all benefit from the platform. They also outline the technical details that allow the tool to securely share data and support many analytics services with a flexibly controlled resource management mechanism over geographically distributed data centers.

Photo of Minh Chau Nguyen

Minh Chau Nguyen

ETRI

Minh Chau Nguyen is a researcher in the smart data platform research department at the Electronic and Telecommunications Research Institute (ETRI). His research interests include big data management, software architecture, and distributed systems.

Photo of Heesun Won

Heesun Won

ETRI

Heesun Won is a principal researcher at the Electronic and Telecommunications Research Institute (ETRI), where she has been developing an open data reference model and data distribution system with semantic data map—SODAS: Smart Open Data as a System. Her research interests include software architecture for big data processing in cloud environments.