Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

A deep dive into leveraging cloud infrastructure for data engineering workloads

Andrei Savu (Cloudera), Jennifer Wu (Cloudera)
11:50am12:30pm Wednesday, March 15, 2017
Secondary topics:  Architecture, Cloud
Average rating: ***..
(3.00, 3 ratings)

What you'll learn

  • Learn how to leverage the various aspects of cloud-native capabilities in order to successfully run data engineering workloads in the cloud

Description

Cloud infrastructure largely focuses on a large, scalable data store, elastic compute, and managed service deployment models. Batch Hadoop data engineering workloads, such as ETL and model training, are particularly well suited to run in cloud environments and leverage such cloud-native infrastructure because large amounts of data can be stored in resilient, scalable storage, big data clusters can easily spin up and down for optimal cloud infrastructure use and cost, and transient cluster deployment models can reduce operational overhead and optimize for end-user ease of use.

Andrei Savu and Jennifer Wu explain how data engineers can leverage the various aspects of cloud-native capabilities in order to successfully run data engineering workloads in the cloud. Andrei and Jennifer provide a deep dive into considerations for running large-scale data engineering workloads in the cloud, including cloud architecture (transient versus persistent clusters), cost, ease-of-use, and security and conclude with a discussion of the latest cutting-edge cloud technologies and how such technology can be applied by data engineers.

Photo of Andrei Savu

Andrei Savu

Cloudera

Andrei Savu is a software engineer at Cloudera, where he’s working on Cloudera Director, a product that makes Hadoop deployments in cloud environments easy and more reliable for customers.

Photo of Jennifer Wu

Jennifer Wu

Cloudera

Jennifer Wu is director of product management for cloud at Cloudera, where she focuses on cloud services and data engineering. Previously, Jennifer worked as a product line manager at VMware, working on the vSphere and Photon system management platforms.