Presented By O'Reilly and Cloudera
Make Data Work
5–7 May, 2015 • London, UK

Information architecture for Apache Hadoop

Mark Samson (Cloudera)
11:45–12:25 Thursday, 7/05/2015
Hadoop Platform
Location: King's Suite - Sandringham
Average rating: ****.
(4.50, 4 ratings)
Slides:   1-PPTX 

Prerequisite Knowledge

High level understanding of Hadoop ecosystem and information architecture concepts.


The Hadoop ecosystem includes a range of tools which together make it possible to build an enterprise data hub capable of storing, processing, and analysing a wide variety of data. However, a platform with such broad capability triggers a question: how to organise the myriad data sets in a way that allows users to explore all the data, discover new data sets, and perform the necessary processing and analysis on the data they need?

This session will answer that question by outlining an information architecture for an enterprise data hub based on Hadoop. This is composed of a number of layers or zones that are designed to allow an organisation to:

  • Ingest data in its full fidelity, in as close to its original, raw form as possible
  • Provide a data discovery and exploration facility for analysts and/or data scientists
  • Bring together and link multiple data sets to provide a business-wide data model
  • Create views of the data that are optimised for the access patterns generated by particular use cases

The session will describe the layers required in an information architecture that can provide these functions, with reference to the particular technologies within the Hadoop ecosystem that enable them.

Photo of Mark Samson

Mark Samson


Mark Samson is a systems engineer at Cloudera.

Comments on this page are now closed.


Eric Morich
7/05/2015 13:07 BST

Great presentation! Would be highliy appreciated if you could share your slides. Thanks.