Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Architecting and building enterprise-class Spark and Hadoop in cloud environments

John Mikula (Google Cloud)
13:3017:00 Tuesday, 23 May 2017
Level: Intermediate
Average rating: *....
(1.33, 3 ratings)

Who is this presentation for?

  • Data engineers, managers, and directors

Prerequisite knowledge

  • Familiarity (and, ideally, experience) with the basic setup and design of Spark/Hadoop clusters

Materials or downloads needed in advance

  • A laptop
  • A Google Cloud Platform trial account

What you'll learn

  • Learn how to deploy Spark and Hadoop clusters in public clouds while taking advantage of the products and design of such clouds (block storage systems, security models, ephemerality, etc.)


John Mikula explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem. To illustrate the concepts, John walks you through hands-on exercises using the Google Cloud Platform.

Topics include:

  • How cloud architecture is different: Design differences and pros and cons, separation of compute and storage, persistent versus ephemeral, large multitenant design versus separated or distributed designs, and security, auditing, and management differences
  • Why you should use a managed Spark and Hadoop solution in a cloud
  • Using multiple cloud products alongside managed Spark and Hadoop clusters: Block storage (Google Cloud Storage, Amazon S3, Azure Blob Storage), analytics services (Google BigQuery, Amazon Redshift, etc.), messaging systems (Google Pub/Sub, SQS, etc.), and streaming solutions (Amazon Kinesis, Google Dataflow, Azure Stream Analytics)
Photo of John Mikula

John Mikula

Google Cloud

John Mikula is a tech lead for Google Cloud, where he manages the team focused on enterprise features for Google Cloud Dataproc.