San FranciscoLondonNew York

Presented By
O’Reilly + Cloudera

Make Data Work

29 April–2 May 2019
London, UK

Please log in

Add to Your Schedule

Running multidisciplinary big data workloads in the cloud

Colm Moynihan (Cloudera), Jonathan Seidman (Cloudera), Michael Kohs (Cloudera)

13:30–17:00 Tuesday, 30 April 2019

Data Engineering and Architecture
Location: Capital Suite 4

Secondary topics: AI and Data technologies in the cloud

Average rating:

(4.00, 2 ratings)

Who is this presentation for?

Data engineers, data scientists, BI engineers, analytic engineers, and those in IT

Level

Intermediate

Prerequisite knowledge

Familiarity with public cloud concepts
A basic understanding of big data workloads (data engineering, data warehousing, etc.)

Materials or downloads needed in advance

A WiFi-enabled laptop (If you want to use the CLI, you need to have Python 3.6 installed and have terminal access.)

What you'll learn

Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
Understand considerations and best practices for data analytics pipelines in the cloud
Explore approaches for sharing metadata across workloads in a big data PaaS

Description

Organizations now run diverse, multidisciplinary big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature.

Colm Moynihan, Jonathan Seidman, and Michael Kohs offer a technical deep dive into cloud architecture and explore the challenges of moving to the cloud. You’ll learn what to keep in mind when moving to the cloud and why it may not be as simple as you thought (e.g., data migration and duplication between on-prem and in the cloud). You’ll also dive into core cloud paradigms not present on-premises that drive architecture decisions (e.g., bursting and different cluster lifecycles and tenancy) as well as security best practices in the cloud (e.g., the basics, common pitfalls, and things often overlooked that you need to get right). Along the way, you’ll learn how to manage metadata between various workloads across multiple clusters, both on-premises and in the cloud.

In the second part of the talk, you’ll get your hands dirty as you learn how to successfully set up and run a data pipeline in the cloud that integrates with data engineering and data warehousing workflows, using the Cloudera Altus PaaS offering, powered by Cloudera Altus SDX. You’ll discover considerations and best practices in getting data pipelines running. You’ll also see how to share metadata across workloads in a big data architecture.

Colm Moynihan

Cloudera

Colm Moynihan is partner presales manager in EMEA for Cloudera, where he helps system integrators, ISVs, hardware, cloud partners, resellers, and distributors drive digital transformation into joint customers. Previously, Colm was director of presales in EMEA at Informatica, working with resellers, OEMs, and GSIs to integrate, master, and cleanse customers’ enterprise data. Colm has over 25 years’ experience in development, consulting, finance and banking, startups, and large multinational software companies. Colm holds a master’s degree in distributed computing from Trinity College Dublin.

Website

Jonathan Seidman

Cloudera

Jonathan Seidman is a software engineer on the cloud team at Cloudera. Previously, he was a lead engineer on the big data team at Orbitz, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data Meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly.

Website

Michael Kohs

Cloudera

Michael Kohs is a product manager at Cloudera.

Presented by

Global Sponsors

Zettabyte Sponsor

Exabyte Sponsor

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com