In the Data Lake: Not Waving but Drowning
Over the past year, the concept of a Data Lake (or Reservoir) has become popular as a vision, and even an architecture, for information storage and processing in an environment where big data is key. In a simple view, the approach suggested is to dump all data in raw form into Hadoop and enable all types of processing and use from there. As the Hadoop ecosystem evolves, this description will undoubtedly become more nuanced. But the fundamental metaphor of a lake is flawed. It suggests that all data, like all the water in a lake, is the same. Nothing could be further from the truth.
The fluidity of the Data Lake is often contrasted with the inflexibility and complication of the Data Warehouse and its supporting processes. This comparison is unsound, based on an erroneous view of the rationale for the warehouse. It mistakes the common implementation of data warehousing solely as a reporting environment with its core principle, which is to create a consistent and reconciled core for such reporting. While the Data Warehouse alone cannot address all the opportunities of the emerging biz-tech ecosystem, Data Lake thinking is in great danger of throwing out the baby with the lake water.
Based on Barry’s book “Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data”, this session explores a more realistic view of modern information and data: that it exists with a variety of important differentiating characteristics, which determine how it should be stored and manipulated. We explore the tri-domain information model and an architecture based on information pillars rather than layers. This approach enables us to properly manage the data that must be reconciled and consistent if business is to be run with proper control. It defines the context-setting information required to preserve meaning and enable governance. It provides the agility needed to extract value from the wealth of externally-sourced data from social media and the Internet of Things. And it offers the opportunity to drive value from all information by empowering true human innovation.
Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. With over 30 years of IT experience, including 20 years with IBM as a Distinguished Engineer, he is a widely respected analyst, consultant, lecturer and author of the seminal book, “Data Warehouse: from Architecture to Implementation” and numerous White Papers. His new book “Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data” was published in 2013.
Barry is founder and principal of 9sight Consulting. He specializes in the human, organizational and IT implications of deep business insight solutions that combine operational, informational and collaborative environments. A regular contributor to BeyeNETWORK, TDWI and more, Barry is based in Cape Town, South Africa and operates worldwide.
Comments on this page are now closed.