Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Building A Data Platform

Stephen O'Sullivan (Data Whisperers), John Akred (Silicon Valley Data Science), Richard Williamson (Silicon Valley Data Science)
1:30pm–5:00pm Wednesday, 10/15/2014
Hadoop Platform
Location: 1 E10/1 E11
Average rating: ***..
(3.09, 23 ratings)

What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive and realtime analytical workloads. By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

  • Acquisition: from internal and external data sources
  • Ingestion: offline and real-time processing
  • Storage
  • Providing data services: exposing data to applications
  • Analytics: batch and interactive
  • Data management: data security, lineage, metadata and quality

We’ll give also advice on:

  • Tool selection
  • The function of the major Hadoop components and other big data technologies such as Spark
  • Hardware sizing and cloud provisioning
  • Integration with legacy systems
Photo of Stephen O'Sullivan

Stephen O'Sullivan

Data Whisperers

A leading expert on big data architecture and Hadoop, Stephen brings over 20 years of experience creating scalable, high-availability, data and applications solutions. A veteran of WalmartLabs, Sun and Yahoo!, Stephen leads data architecture and infrastructure.

Photo of John Akred

John Akred

Silicon Valley Data Science

With over 15 years in advanced analytical applications and architecture, John is dedicated to helping organizations become more data-driven. He combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.

Photo of Richard Williamson

Richard Williamson

Silicon Valley Data Science

Richard has been at the cutting edge of big data since its inception, leading multiple efforts to build multi-petabyte Hadoop platforms, maximizing business value by combining data science with big data. He has extensive experience creating advanced analytic systems using data warehousing and data mining technologies

Comments on this page are now closed.


Picture of Julie Steele
Julie Steele
10/15/2014 10:14am EDT

Thanks for letting us know about the slides! If you’d like a copy of the slides to read up close, you can request them at

Karthik Murugesan
10/15/2014 10:02am EDT

Following this on live streaming. We just lost the audio. Can you please check?


Picture of Robert Novak
Robert Novak
10/15/2014 9:53am EDT

The slides are barely readable from a distance… I’d guess half the audience can’t read most of them because of colors/contrast/size.

Priya Agarwal
10/15/2014 9:43am EDT

can we get access to the slides?

. .
10/11/2014 4:47pm EDT


Looking forward to this session. Is there anything I should install before taking the tutorial?