Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Running multidisciplinary big data workloads in the cloud

Jason Wang (Cloudera), Brandon Freeman (Cloudera), Michael Kohs (Cloudera), Toby Ferguson (Cloudera)
1:30pm5:00pm Tuesday, March 26, 2019
Secondary topics:  AI and Data technologies in the cloud

Who is this presentation for?

  • Data engineers, data scientists, BI engineers, analytic engineers, and those in IT

Level

Intermediate

Prerequisite knowledge

  • Familiarity with public cloud concepts
  • A basic understanding of big data workloads (data engineering, data warehousing, etc.)

Materials or downloads needed in advance

  • A WiFi-enabled laptop (If you want to use the CLI, you need to have Python 3.6 installed and have terminal access.)

What you'll learn

  • Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
  • Understand considerations and best practices for data analytics pipelines in the cloud
  • Explore approaches for sharing metadata across workloads in a big data PaaS

Description

Organizations now run diverse, multidisciplinary big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature.

There are many challenges with moving these workloads to the cloud and running them. Jason Wang, Brandon Freeman, Michael Kohs, and Toby Ferguson explore cloud architecture and its challenges and walk you through using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.

Topics include:

  • Considerations when moving the cloud and why it may not be as simple as you thought (e.g., data migration and duplication between on-premises and cloud deployments)
  • Core cloud paradigms not present on-premises that drive architecture decisions (e.g., bursting, different cluster lifecycles, and tenancy)
  • Security best practices in the cloud
  • How to manage metadata between various workloads across multiple clusters, both on-premises and in the cloud
  • Considerations and best practices for getting data pipelines running
  • How to share metadata across workloads in a big data architecture
Photo of Jason Wang

Jason Wang

Cloudera

Jason Wang is a software engineer at Cloudera focusing on the cloud.

Brandon Freeman

Cloudera

Brandon Freeman is a Mid-Atlantic region strategic system engineer at Cloudera, specializing in infrastructure, the cloud, and Hadoop. Previously, Brandon was an infrastructure architect at Explorys, working in operations, architecture, and performance optimization for the Cloudera Hadoop environments, where he was responsible for designing, building, and managing many large Hadoop clusters.

Photo of Michael Kohs

Michael Kohs

Cloudera

Michael Kohs is a product manager at Cloudera.

Toby Ferguson

Cloudera

Toby Ferguson is a sales engineer at Cloudera, where he helps partners succeed with the Cloudera platform.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)