Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Organizing the data lake

Mark Madsen (Third Nature)
14:0514:45 Wednesday, 24 May 2017
Level: Intermediate
Average rating: ***..
(3.33, 12 ratings)

Who is this presentation for?

  • Architects, system designers, and data management professionals

Prerequisite knowledge

  • Familiarity with data workloads, usage patterns, and architecture for managing data

What you'll learn

  • Learn a reference architecture to help understand when to save immutable data, when to standardize it, and how to manage its delivery and use over time


Building a data lake involves more than installing and using Hadoop. The focus in the market has been on all the different technology components, ignoring the more important part: the data architecture that the code implements, which lies at the core of the system.

Just like a data warehouse, a data lake has a data architecture. If you expect any longevity from the platform, the architecture should be designed rather than accidental.

But what are the design principles that lead to good functional design and a workable data architecture? What are the assumptions that limit old approaches? How can one integrate with or migrate from the older environments? How does this affect an organization’s data management? Answering these questions is key to building long-term infrastructure.

The goal in most organizations is to build multiuse data infrastructure that is not subject to past constraints. Mark Madsen discusses hidden design assumptions, reviews design principles to apply when building multiuse data infrastructure, and provides a reference architecture. This reference architecture has been used across many organizations to work toward a unified analytic infrastructure.

Photo of Mark Madsen

Mark Madsen

Third Nature

Mark Madsen is a research analyst at Third Nature, where he advises companies on data strategy and technology planning. Mark has designed analysis, data collection, and data management infrastructure for companies worldwide. He focuses on two types of work: the business applications of data and guiding the construction of data infrastructure. As a result, Mark does as much information strategy and IT architecture work as he does performance management and analytics.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)