Cloud infrastructure largely focuses on a large, scalable data store, elastic compute, and managed service deployment models. Batch Hadoop data engineering workloads, such as ETL and model training, are particularly well suited to run in cloud environments and leverage such cloud-native infrastructure because large amounts of data can be stored in resilient, scalable storage, big data clusters can easily spin up and down for optimal cloud infrastructure use and cost, and transient cluster deployment models can reduce operational overhead and optimize for end-user ease of use.
Andrei Savu and Jennifer Wu explain how data engineers can leverage the various aspects of cloud-native capabilities in order to successfully run data engineering workloads in the cloud. Andrei and Jennifer provide a deep dive into considerations for running large-scale data engineering workloads in the cloud, including cloud architecture (transient versus persistent clusters), cost, ease-of-use, and security and conclude with a discussion of the latest cutting-edge cloud technologies and how such technology can be applied by data engineers.
Andrei Savu is a software engineer at Cloudera, where he’s working on Cloudera Director, a product that makes Hadoop deployments in cloud environments easy and more reliable for customers.
Jennifer Wu is director of product management for cloud at Cloudera, where she focuses on cloud services and data engineering. Previously, Jennifer worked as a product line manager at VMware, working on the vSphere and Photon system management platforms.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.