Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Architecting a data platform

Stephen O'Sullivan (Data Whisperers), John Akred (Silicon Valley Data Science), Gary Dusbabek (Silicon Valley Data Science)
1:30pm–5:00pm Tuesday, 09/29/2015
Spark & Beyond
Location: 3D 03/10 Level: Intermediate
Average rating: ***..
(3.38, 24 ratings)

Materials or downloads needed in advance

No special requisites


What are the essential components of a data platform? This tutorial will explain how the various parts of the Hadoop and big data ecosystem fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.

By tracing the flow of data from source to output, we’ll explore the options and considerations for components, including:

  • Acquisition: from internal and external data sources
  • Ingestion: offline and real-time processing
  • Storage
  • Providing data services: exposing data to applications
  • Analytics: batch and interactive
  • Data management: data security, lineage, metadata, and quality

We’ll give also advice on:

  • Tool selection
  • The function of the major Hadoop components and other big data technologies such as Spark and Kafka
  • Hardware sizing and cloud provisioning
  • Integration with legacy systems
Photo of Stephen O'Sullivan

Stephen O'Sullivan

Data Whisperers

A leading expert on big data architectures, Stephen O’Sullivan has 25 years of experience creating scalable, high-availability data and applications solutions. A veteran of Silicon Valley Data Science, @WalmartLabs, Sun, and Yahoo. Stephen is an independent adviser to enterprises on all things data..

Photo of John Akred

John Akred

Silicon Valley Data Science

With over 15 years in advanced analytical applications and architecture, John Akred is dedicated to helping organizations become more data driven. As CTO of Silicon Valley Data Science, John combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.

Photo of Gary Dusbabek

Gary Dusbabek

Silicon Valley Data Science

An Apache Cassandra committer and PMC member, Gary Dusbabek specializes in building distributed systems. His recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.

Comments on this page are now closed.


Bhaskar Nag
10/12/2015 9:33am EDT

I signed up at that page and got a confirmation for a mailing list subscription. But didn’t get the presentation slides. Please advise.

Ami Khandeshi
09/29/2015 12:28pm EDT

Hmm..I did register, and validated the account. I didn’t receive the materials, yet..

Picture of Stephen O'Sullivan
Stephen O'Sullivan
09/29/2015 11:26am EDT

You can sign up to receive the presentation materials here:

Srinivas Reddy
09/29/2015 10:43am EDT

Would these slides be shared with attendees? I notice lot of pictures being taken with smart phones, I don’t want to annoy other attendees behind me by taking pictures :)

Ami Khandeshi
09/29/2015 10:16am EDT

Where are the slides?

Picture of Stephen O'Sullivan
Stephen O'Sullivan
09/28/2015 7:45pm EDT

I’m sorry, but we do not control registration. Please check with the O’Reilly staff when you are onsite at the conference. Hope to see you tomorrow.

Picture of Nii Attoh-Okine
Nii Attoh-Okine
09/28/2015 6:58pm EDT

Please, can I join the session, I registered but I received sold out

Ann Manchella
09/28/2015 6:30pm EDT

Can I please join this session?

Debashish Sarkar
09/28/2015 5:16am EDT

Use cases and architectural patterns surrounding the Data Lake are traversing into the operational area – impacting Operational reporting, Enterprise Integration Services, Data Virtualization and the EDW. What would be the best practices for such use cases?

Picture of Stephen O'Sullivan
Stephen O'Sullivan
09/25/2015 2:20pm EDT

Our session has moved to a larger room so we can now accommodate more people. If you already registered for a different session but want to attend this session, email to switch your registration.

Picture of Lisa Kopitzke
Lisa Kopitzke
09/22/2015 10:48am EDT

Can this session be recorded or multi-cast? It is probably my first pick for sessions and it’s sold out.

Picture of Stephen O'Sullivan
Stephen O'Sullivan
09/02/2015 10:56am EDT

Please let us know what questions you’d like us to answer during this tutorial. We’ve listed a few example questions below to get you thinking.

*How can I add real time analytical capabilities to the data I’m currently ingesting?
*How can I onboard new sources of behavioral data to better personalize my user experience?
*Do I move legacy data or access it in place as I build a data lake?
*How do I get a scalable, cost effective platform to execute my analytical services?
*How do I expose my analytical outputs to support customized interactive user experiences?