Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Building A Data Platform

Manu Mukerji (8x8), John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers)
1:30pm–5:00pm Wednesday, 02/18/2015
Hadoop Platform
Location: 210 C/G
Average rating: ***..
(3.25, 16 ratings)


What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads.

By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

  • Acquisition: from internal and external data sources
  • Ingestion: offline and real-time processing
  • Storage
  • Providing data services: exposing data to applications
  • Analytics: batch and interactive
  • Data management: data security, lineage, metadata and quality

We’ll give also advice on:

  • tool selection
  • the function of the major Hadoop components and other big data technologies such as Spark
  • hardware sizing and cloud provisioning
  • integration with legacy systems
Photo of Manu Mukerji

Manu Mukerji


Manu has a background in cloud computing and big data, handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions, and has extensive experience working in online advertising and social media.

Photo of John Akred

John Akred

Silicon Valley Data Science

With over 15 years in advanced analytical applications and architecture, John is dedicated to helping organizations become more data-driven. He combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.

Photo of Stephen O'Sullivan

Stephen O'Sullivan

Data Whisperers

A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.

Comments on this page are now closed.


Picture of Stephen O'Sullivan
Stephen O'Sullivan
03/18/2015 3:50am PDT


you can get the slides here
If you have questions feel free to email me on


Joshua D. Lickteig
03/18/2015 1:18am PDT

Hey guys – Excellent material in this session, as well as Ask Us Anything; still resonating these weeks later and highly relevant to some formative work of the moment, particularly regarding data services and enterprise enablement for analytics. Is there a link to a shareable version of the slides presented here? Thanks!

Picture of Stephen O'Sullivan
Stephen O'Sullivan
02/17/2015 9:24am PST

All the demo code for our talk is here:

Picture of John Akred
John Akred
02/17/2015 3:58am PST

Please feel free to post specific topics or questions you would like covered during the session here. We will also solicit topics and questions from the audience before we get started.

Picture of Manu Mukerji
Manu Mukerji
02/13/2015 2:38am PST

There is no prior software download or install needed.

Subramanyam Voora
02/11/2015 1:49pm PST

Is there any prior software download and install requirement for this tutorial?