Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Running data analytic workloads in the cloud

Eugene Fratkin (Cloudera), Vinithra Varadharajan (Cloudera), Mael Ropars (Cloudera), Jason Wang (Cloudera)

9:00–12:30 Tuesday, 22 May 2018

Data engineering and architecture
Location: Capital Suite 13 Level: Intermediate

Average rating:

(5.00, 1 rating)

Download slides (PDF)

Who is this presentation for?

Data engineers, developers, data scientists, system architects, system administrators, and those working in information security

Prerequisite knowledge

A basic understanding of data warehousing

Materials or downloads needed in advance

A WiFi-enabled laptop
To work with a command-line interface (optional), install Python 2.7 or above (Be able to install packages using PIP.)

What you'll learn

Learn how to create data pipelines and manage them in the cloud and in hybrid cloud environments
Understand how to implement metadata sharing and discovery across data applications

Description

Over the past several years, ever-increasing quantities of data are being processed within public clouds. The cloud promises to provide solutions to some of the limitations of conventional single multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand compute and allows data to be shared between on-demand provisioned processing engines such as Hive, Spark, and Impala. But to fulfill this promise, you first need to solve several technical challenges: simple resource allocation, cross-cluster metadata sharing, and a common authorization framework. Without comprehensive answers to these questions, the challenges of single cluster model are simply duplicated inside a public cloud environment.

The cloud enables the delivery of solutions to single, multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand computing. Vinithra Varadharajan, Jason Wang, Eugene Fratkin, and Mael Ropars detail new paradigms to effectively run production-level pipelines with minimal operational overhead. As a part of the deep dive, they also walk you through creating such a pipeline and executing data processing and data analytic workflows. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.

Eugene Fratkin

Cloudera

Eugene Fratkin is a director of engineering at Cloudera, heading Cloud R&D. He was one of the founding members of the Apache MADlib project (scalable in-database algorithms for machine learning). Previously, Eugene was a cofounder of a Sequoia Capital-backed company focusing on applications of data analytics to problems of genomics. He holds PhD in computer science from Stanford University’s AI lab.

Website

Vinithra Varadharajan

Cloudera

Vinithra Varadharajan is a senior engineering manager in the cloud organization at Cloudera, where she’s responsible for the cloud portfolio products, including Altus Data Engineering, Altus Analytic Database, Altus SDX, and Cloudera Director. Previously, Vinithra was a software engineer at Cloudera working on Cloudera Director and Cloudera Manager with a focus on automating Hadoop lifecycle management.

Mael Ropars

Cloudera

Mael Ropars is a senior sales engineer at Cloudera, helping customers solve their big data problems using enterprise data hubs based on Hadoop. Mael has 15 years’ experience working around big data, information management, and middleware in technical sales and service delivery.

Jason Wang

Cloudera

Jason Wang is a software engineer at Cloudera focusing on the cloud.

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com