Nowadays, there has been an increasing number of data exchange demands for research, analytic and collaboration in many areas. However, the large amount of data from the systems, services, and users of different domains results in many problems in maintenance and operation management. If each data marketplace center separately builds and manages data itself, the availability, scalability, and applicability of data analytic services, especially from authorized third parties, will be reduced and rendered ineffective.
Minh Chau Nguyen and Heesun Won explain how to implement analytics services in data marketplace systems on a single Hadoop cluster across distributed data centers. The solution extends the overall architecture of the Hadoop ecosystem with the blockchain so that multiple tenants and authorized third parties can securely access data while still maintaining privacy, scalability, and reliability. The platform includes blockchain- and taxonomy-based metadata and master data management for data classification, validation, and quality controlling; private key and advanced attribute-based access control for improving authorization; Spark on YARN with filesystem isolation and network management support; and an enhanced multitenant scheduler for dynamically allocating, suspending, and resuming data processing on selective nodes among data centers.
Minh Chau and Heesun share the challenges in building the platform to meet the system requirements for data marketplace centers as well as how data owners, data scientists, and developers all benefit from the platform. They also outline the technical details that allow the tool to securely share data and support many analytics services with a flexibly controlled resource management mechanism over geographically distributed data centers.
Minh Chau Nguyen is a researcher in the smart data platform research department at the Electronic and Telecommunications Research Institute (ETRI). His research interests include big data management, software architecture, and distributed systems.
Heesun Won is a principal researcher at the Electronic and Telecommunications Research Institute (ETRI), where she has been developing an open data reference model and data distribution system with semantic data map—SODAS: Smart Open Data as a System. Her research interests include software architecture for big data processing in cloud environments.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com