With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads, especially those such as ETL and model training batch workloads that use Hive and Spark compute engines. Jennifer Wu, Philip Langdale, and Kostas Sakellis explain how data engineers can leverage the cloud in order to successfully run data engineering workloads. They explore the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers, and cover the advantages of the managed service deployment model and security best practices to demonstrate how to apply these technologies in your own projects.
Jennifer Wu is director of product management for cloud at Cloudera, where she focuses on cloud services and data engineering. Previously, Jennifer worked as a product line manager at VMware, working on the vSphere and Photon system management platforms.
Philip Langdale is the engineering lead for cloud at Cloudera. He joined the company as one of the first engineers building Cloudera Manager and served as an engineering lead for that project until moving to working on cloud products. Previously, Philip worked at VMware, developing various desktop virtualization technologies. Philip holds a bachelor’s degree with honors in electrical engineering from the University of Texas at Austin.
Kostas Sakellis is the lead and engineering manager of the Apache Spark team at Cloudera. Kostas holds a bachelor’s degree in computer science from the University of Waterloo, Canada.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org