Public cloud usage for Hadoop workloads is accelerating, and consequently, Hadoop components have adapted to leverage cloud infrastructure, including object storage and elastic compute. Hive, Spark, and Impala are able to read input and write output directly to AWS S3 storage. Since data persisted in S3 lives beyond cluster life-cycles, users can now leverage tools to spin up Hadoop clusters for specific time periods or workloads, grow and shrink the cluster as needed, and terminate clusters when the clusters are no longer being used. Therefore, Hadoop clusters in the public cloud can be both transient and elastic in nature.
Andrei Savu, Vinithra Varadharajan, Matthew Jacobs, and Jennifer Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud as they walk you through using existing tools to create and configure Hive, Spark, and Impala deployments in the AWS environment with considerations for network settings, AWS instances types, and security options. Andrei, Vinithra, Matthew, and Jennifer also demonstrate how Hadoop clusters can also be easily deployed into Azure and Google Cloud Platform. Once deployed, you’ll be able to grow and shrink clusters to accommodate your workloads.
Andrei Savu is a software engineer at Cloudera, where he’s working on Cloudera Director, a product that makes Hadoop deployments in cloud environments easy and more reliable for customers.
Vinithra Varadharajan is a senior engineering manager in the cloud organization at Cloudera, where she’s responsible for the cloud portfolio products, including Altus Data Engineering, Altus Analytic Database, Altus SDX, and Cloudera Director. Previously, Vinithra was a software engineer at Cloudera working on Cloudera Director and Cloudera Manager with a focus on automating Hadoop lifecycle management.
Jennifer Wu is director of product management for cloud at Cloudera, where she focuses on cloud services and data engineering. Previously, Jennifer worked as a product line manager at VMware, working on the vSphere and Photon system management platforms.
Matthew Jacobs is a software engineer at Cloudera working on Impala.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.