Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Running multidisciplinary big data workloads in the cloud

Jason Wang (Cloudera), Brandon Freeman (Cloudera), Michael Kohs (Cloudera), Akihiro Ishikawa (Cloudera), Toby Ferguson (Cloudera)
1:30pm5:00pm Tuesday, March 26, 2019
Secondary topics:  AI and Data technologies in the cloud
Average rating: ***..
(3.20, 5 ratings)

Who is this presentation for?

  • Data engineers, data scientists, BI engineers, analytic engineers, and those in IT

Level

Intermediate

Prerequisite knowledge

  • Familiarity with public cloud concepts
  • A basic understanding of big data workloads (data engineering, data warehousing, etc.)

Materials or downloads needed in advance

  • A WiFi-enabled laptop (If you want to use the CLI, you need to have Python 3.6 installed and have terminal access.)

What you'll learn

  • Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
  • Understand considerations and best practices for data analytics pipelines in the cloud
  • Explore approaches for sharing metadata across workloads in a big data PaaS

Description

Organizations now run diverse, multidisciplinary big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature.

There are many challenges with moving these workloads to the cloud and running them. Jason Wang, Brandon Freeman, Michael Kohs, Akihiro Nishikawa, and Toby Ferguson explore cloud architecture and its challenges and walk you through using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.

Topics include:

  • Considerations when moving the cloud and why it may not be as simple as you thought (e.g., data migration and duplication between on-premises and cloud deployments)
  • Core cloud paradigms not present on-premises that drive architecture decisions (e.g., bursting, different cluster lifecycles, and tenancy)
  • Security best practices in the cloud
  • How to manage metadata between various workloads across multiple clusters, both on-premises and in the cloud
  • Considerations and best practices for getting data pipelines running
  • How to share metadata across workloads in a big data architecture
Photo of Jason Wang

Jason Wang

Cloudera

Jason Wang is a software engineer at Cloudera focusing on the cloud.

Brandon Freeman

Cloudera

Brandon Freeman is a Mid-Atlantic region strategic system engineer at Cloudera, specializing in infrastructure, the cloud, and Hadoop. Previously, Brandon was an infrastructure architect at Explorys, working in operations, architecture, and performance optimization for the Cloudera Hadoop environments, where he was responsible for designing, building, and managing many large Hadoop clusters.

Photo of Michael Kohs

Michael Kohs

Cloudera

Michael Kohs is a product manager at Cloudera.

Photo of Akihiro Ishikawa

Akihiro Ishikawa

Cloudera

Akihiro Ishikawa is a software engineer at Cloudera

Toby Ferguson

Cloudera

Toby Ferguson is a sales engineer at Cloudera, where he helps partners succeed with the Cloudera platform.