Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Running a Cloudera cluster in production on Azure

Paige Liu (Microsoft), John Zhuge (Netflix)
2:40pm3:20pm Wednesday, March 15, 2017
Big data and the Cloud, Data engineering and architecture
Location: LL21 E/F Level: Beginner
Secondary topics:  Architecture, Cloud

Who is this presentation for?

  • Hadoop administrators and enterprise architects

Prerequisite knowledge

  • Basic knowledge of Hadoop and Cloudera

What you'll learn

  • Understand the main cloud resources (and their trade-offs) for Cloudera clusters on Azure
  • Explore the options to deploy, scale, and run Cloudera clusters on Azure
  • Learn how to leverage related Azure services with Cloudera to build enterprise-grade big data solutions


Paige Liu and John Zhuge explore the options and trade-offs to consider when building a Cloudera cluster on Microsoft Azure Cloud and explain how to deploy and scale a Cloudera cluster on Azure and how to connect a Cloudera cluster with other Azure services to build enterprise-grade end-to-end big data solutions.

Topics include:

  • How storage and VM choices can impact performance, throughput, and the cost of your cluster
  • When to use a deployment template versus Cloudera Director
  • How to add and remove nodes
  • How to stop the cluster when not in use to save cost
  • How to do backup and restore
  • What options exist for HA and DR
  • How to leverage Azure Active Directory for DNS, authentication, and single sign on
  • How to ingest data with IoT Hub or Event Hub
  • How to connect to Azure Machine Learning to do predictive analytics
  • How to visualize data in Power BI
Photo of Paige Liu

Paige Liu


Paige Liu is a software developer with Microsoft. Paige has been involved in the development of a wide range of diverse applications and services, from web applications to large-scale multitier distributed systems to hyper-scale search engine backends. While most of her experience is with Microsoft technology, Paige has also developed and released cross-platform solutions including Java APM (Application Performance Monitoring) and Linux Unix monitoring systems. Recently, she has been focusing on cloud computing, specifically with the Microsoft Azure cloud, helping enterprises to develop new applications in the cloud or move their existing workload to the cloud.

Photo of John Zhuge

John Zhuge


John Zhuge is a software engineer at Netflix focusing on big data processing engines and storage systems for cloud. As an active Apache Hadoop Committer, he contributes to open source distributed systems in the Apache Hadoop ecosystem. Previously, John designed and implemented filesystems and protocols for storage systems. John holds seven US patents.