Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

A deep dive into running data analytic workloads in the cloud

Jason Wang (Cloudera), Mala Ramakrishnan (Cloudera), Stefan Salandy (Cloudera), Aishwarya Venkataraman (Cloudera), Vinithra Varadharajan (Cloudera), Aaron Myers (Cloudera, Inc.)

9:00am–12:30pm Tuesday, March 6, 2018

Big data and data science in the cloud, Data engineering and architecture
Location: 210 D/H

Average rating:

(3.25, 4 ratings)

Who is this presentation for?

Hadoop administrators, data engineers, and BI analysts

Prerequisite knowledge

A basic understanding of AWS or Azure cloud infrastructure
Familiarity with big data concepts

Materials or downloads needed in advance

Please bring a laptop that has internet access and a SSH client installed
In case you want to use CLI, you need to have Python 3.6 installed and have terminal access.

What you'll learn

Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
Understand the considerations and best practices for data analytics pipelines in the cloud

Description

Public cloud usage for large-scale data processing is rapidly increasing, and running data engineering workloads in the cloud is becoming easier and more cost effective. Compute engines have adapted to leverage cloud infrastructure, including object storage and elastic compute. For example, Hive, Spark, Impala, and HBase compute engines are able to read input from and write output directly to AWS S3 and Azure Data Lake storage. Moreover, these read and write paths have been optimized for fast processing speeds, lowering the overall cost of running a job. In addition, platform-as-a-service offerings for data processing in the cloud have evolved to minimize the operational overhead of clusters, enabling end users to focus on developing, running, and troubleshooting jobs.

It is important for end users to be able to implement data pipeline workflows that seamlessly transition from one stage of the data pipeline to the next. Aishwarya Venkataraman, Jason Wang, Mala Ramakrishnan, Stefan Salandy, and Vinithra Varadharajan lead a deep dive into running data analytic workloads in a managed service capacity in the public cloud and highlight cloud infrastructure best practices.

Jason Wang

Cloudera

Jason Wang is a software engineer at Cloudera focusing on the cloud.

Mala Ramakrishnan

Cloudera

Mala Ramakrishnan heads product initiatives for Cloudera Altus – big data platform-as-a-service. She has 17+ years experience in product management, marketing, and software development in organizations of varied sizes that deliver middleware, software security, network optimization, and mobile computing. She holds a master’s degree in computer science from Stanford University.

Website

Stefan Salandy

Cloudera

Stefan Salandy is a systems engineer at Cloudera.

Aishwarya Venkataraman

Cloudera

Software Engineer on the Cloudera Altus team.

Vinithra Varadharajan

Cloudera

Vinithra Varadharajan is a senior engineering manager in the cloud organization at Cloudera, where she’s responsible for the cloud portfolio products, including Altus Data Engineering, Altus Analytic Database, Altus SDX, and Cloudera Director. Previously, Vinithra was a software engineer at Cloudera working on Cloudera Director and Cloudera Manager with a focus on automating Hadoop lifecycle management.

Aaron Myers

Cloudera, Inc.

Aaron T. Myers is a Software Engineer at Cloudera and an Apache Hadoop Committer. Aaron’s work is primarily focused on HDFS. Prior to joining Cloudera, Aaron was a Software Engineer and VP of Engineering at Amie Street, where he worked on all components of the software stack, including operations, infrastructure, and customer-facing feature development. Aaron holds both an Sc.B. and Sc.M. in Computer Science from Brown University.

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com