Sep 23–26, 2019
Please log in

Running multidisciplinary big data workloads in the cloud with CDP

James Morantus (Cloudera), Tony Huinker (Cloudera), Naren Koneru (Cloudera), Ramachandran Venkatesh (Cloudera), Gunther Hagleitner (Cloudera), Olli Draese (Cloudera)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 14
Average rating: ***..
(3.33, 3 ratings)

Who is this presentation for?

  • Data engineers, data scientists, BI engineers, analytic engineers, and those in IT

Level

Intermediate

Description

Organizations now run diverse, multidisciplinary, big data workloads that span data engineering, data warehousing, and data science applications. Many of these workloads operate on the same underlying data, and the workloads themselves can be transient or long running in nature.
There are many challenges with moving these workloads to the cloud. In this talk we start off with a technical deep dive into Cloudera Data Platform (CDP).

Topics include:

  • Architecture of CDP and considerations that went into key design decisions (e.g. bursting, security, and different workload lifecycles)
  • Options for data engineering, machine learning, data warehouse as well as the different capabilities, including autoscaling, for each option
  • Security, governance, and metadata best practices in the cloud (e.g. the basics, common pitfalls, and things often overlooked that you need to get right)
  • Delivering data science capabilities to a wide range of users, without allocating individual clusters.

In addition, you’ll see how to successfully set up and run a data pipeline with CDP that integrates both data engineering and data warehousing workflows. We’ll explore considerations and best practices in getting data pipelines running. Along the way you’ll also see how to share metadata across workloads in a big data architecture. Note: This session will not have a hands-on component, as CDP is in Preview.

Prerequisite knowledge

  • Familiarity with public cloud concepts
  • A basic understanding of big data workloads (data engineering and data warehousing)

Materials or downloads needed in advance

  • A WiFi-enabled laptop (If you want to use the CLI, you need to have Python 3.6 installed and have terminal access.)

What you'll learn

  • Learn how to successfully run a data analytics pipeline in the cloud and integrate data engineering and data analytic workflows
  • Understand considerations and best practices for data analytics pipelines in the cloud
  • Explore approaches for sharing metadata across workloads in a big data PaaS
Photo of James Morantus

James Morantus

Cloudera

James Morantus is a Cloud Solutions/Customer Success Engineer at Cloudera. Previously, James was a Senior Solutions Architect with the Professional Service organization at Cloudera, delivering services both on-prem and on the public cloud.

Tony Huinker

Cloudera

Naren Koneru

Cloudera

Naren Koneru is an engineering manager at Cloudera and leads the navigator development team. Prior to Cloudera, Naren was at Miti, building enterprise-wide metadata and governance solutions. Before joining Miti, Naren spent over seven years with the platform team at Informatica and was instrumental in making PowerCenter the leading data integration platform. He has a master’s in computer science from East Tennessee State University and bachelor’s from Osmania University.

Ramachandran Venkatesh

Cloudera

Gunther Hagleitner

Cloudera

Olli Draese

Cloudera

Comments on this page are now closed.

Comments

Nancy Cely | Jefe de Arquitectura de Información
09/23/2019 3:55am EDT

Yes, thats the subjefes I am interesting.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires