The Hadoop ecosystem includes a range of tools which together make it possible to build an enterprise data hub capable of storing, processing, and analysing a wide variety of data. However, a platform with such broad capability triggers a question: how to organise the myriad data sets in a way that allows users to explore all the data, discover new data sets, and perform the necessary processing and analysis on the data they need?
This session will answer that question by outlining an information architecture for an enterprise data hub based on Hadoop. This is composed of a number of layers or zones that are designed to allow an organisation to:
The session will describe the layers required in an information architecture that can provide these functions, with reference to the particular technologies within the Hadoop ecosystem that enable them.
Mark Samson is a systems engineer at Cloudera.
Comments on this page are now closed.
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.