Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Architecting a data platform (Half Day)

John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Data Whisperers), Gary Dusbabek (Silicon Valley Data Science)
9:00am–12:30pm Tuesday, 03/29/2016
Spark & Beyond

Location: LL21 C/D
Average rating: ***..
(3.96, 49 ratings)

Prerequisite knowledge

Attendees should be familiar with database systems.

Materials or downloads needed in advance

We will provide a GitHub link for sample code.


What are the essential components of a data platform? John Akred, Stephen O’Sullivan, and Gary Dusbabek explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.

By tracing the flow of data from source to output, John, Stephen, and Gary explore the options and considerations for components, including:

  • Acquisition: from internal and external data sources
  • Ingestion: offline and real-time processing
  • Storage
  • Analytics: batch and interactive
  • Providing data services: exposing data to applications

Other topics include:

  • Tool selection
  • The function of the major Hadoop components and other big data technologies such as Spark and Kafka
  • Integration with legacy systems
Photo of John Akred

John Akred

Silicon Valley Data Science

With over 15 years in advanced analytical applications and architecture, John Akred is dedicated to helping organizations become more data driven. As CTO of Silicon Valley Data Science, John combines deep expertise in analytics and data science with business acumen and dynamic engineering leadership.

Photo of Stephen O'Sullivan

Stephen O'Sullivan

Data Whisperers

A leading expert on big data architectures, Stephen O’Sullivan has 25 years of experience creating scalable, high-availability data and applications solutions. A veteran of Silicon Valley Data Science, @WalmartLabs, Sun, and Yahoo. Stephen is an independent adviser to enterprises on all things data..

Photo of Gary Dusbabek

Gary Dusbabek

Silicon Valley Data Science

An Apache Cassandra committer and PMC member, Gary Dusbabek specializes in building distributed systems. His recent experience includes creating an open source high-volume metrics processing pipeline and building out several geographically distributed API services in the cloud.

Comments on this page are now closed.


Picture of John Akred
John Akred
03/29/2016 5:10am PDT

The slides can be requested here:

Haidar Hadi
03/29/2016 5:02am PDT

can I get the URL for the sides please ?