Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Running multidisciplinary big data workloads in the cloud

Jason Wang (Cloudera), Brandon Freeman (Cloudera), Michael Kohs (Cloudera), Akihiro Ishikawa (Cloudera), Toby Ferguson (Cloudera)

1:30pm–5:00pm Tuesday, March 26, 2019

Data Engineering & Architecture
Location: 2008

Secondary topics: AI and Data technologies in the cloud

Average rating:

(3.20, 5 ratings)

Who is this presentation for?

Data engineers, data scientists, BI engineers, analytic engineers, and those in IT

Level

Intermediate

Prerequisite knowledge

Familiarity with public cloud concepts
A basic understanding of big data workloads (data engineering, data warehousing, etc.)

Materials or downloads needed in advance

A WiFi-enabled laptop (If you want to use the CLI, you need to have Python 3.6 installed and have terminal access.)

What you'll learn

Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
Understand considerations and best practices for data analytics pipelines in the cloud
Explore approaches for sharing metadata across workloads in a big data PaaS

Description

Organizations now run diverse, multidisciplinary big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature.

There are many challenges with moving these workloads to the cloud and running them. Jason Wang, Brandon Freeman, Michael Kohs, Akihiro Nishikawa, and Toby Ferguson explore cloud architecture and its challenges and walk you through using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.

Topics include:

Considerations when moving the cloud and why it may not be as simple as you thought (e.g., data migration and duplication between on-premises and cloud deployments)
Core cloud paradigms not present on-premises that drive architecture decisions (e.g., bursting, different cluster lifecycles, and tenancy)
Security best practices in the cloud
How to manage metadata between various workloads across multiple clusters, both on-premises and in the cloud
Considerations and best practices for getting data pipelines running
How to share metadata across workloads in a big data architecture

Jason Wang

Cloudera

Jason Wang is a software engineer at Cloudera focusing on the cloud.

Brandon Freeman

Cloudera

Brandon Freeman is a Mid-Atlantic region strategic system engineer at Cloudera, specializing in infrastructure, the cloud, and Hadoop. Previously, Brandon was an infrastructure architect at Explorys, working in operations, architecture, and performance optimization for the Cloudera Hadoop environments, where he was responsible for designing, building, and managing many large Hadoop clusters.

Michael Kohs

Cloudera

Michael Kohs is a product manager at Cloudera.

Akihiro Ishikawa

Cloudera

Akihiro Ishikawa is a software engineer at Cloudera

Toby Ferguson

Cloudera

Toby Ferguson is a sales engineer at Cloudera, where he helps partners succeed with the Cloudera platform.

Website

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com