Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Herding elephants: Seamless data access in a multicluster clouds

Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
12:0512:45 Thursday, 2 May 2019
Average rating: ****.
(4.17, 6 ratings)

Who is this presentation for?

  • Anyone involved in building data lakes or managing Hadoop clusters in the cloud

Level

Intermediate

Prerequisite knowledge

  • Familiarity with the Hive metastore

What you'll learn

  • Explore the advantages and disadvantages of multi-data lake solutions
  • Learn how to enable data sharing in decentralized data platforms
  • Gain insight into the virtual data lake, which provides a unified view of data platforms in large organizations

Description

Expedia Group is in the process of migrating its Hadoop infrastructure from a single organization-wide on-premises cluster to large numbers of smaller in-cloud clusters. It has also moved from a centralized operating model, where one team was responsible for the Hadoop platform, to a distributed approach, where infrastructure is owned and operated by the group’s different brands: Hotels.com, Expedia.com, HomeAway.com, etc. This segmentation of data platforms has allowed the company to realize greater agility, resource elasticity, and reduced costs. However, it has generated architectural fragmentation, creating cloud-based data silos that impeded the ability to explore, discover, and share data across the organization.

Pradeep Bhadani and Elliot West describe these technical challenges and the solutions that were developed to provide users with a virtual unified view of the company’s many data lakes. They then offer an overview of Apiary, an open source project that provides a standardized pattern for deploying and operating data lakes that support a federated dataset sharing across accounts, regions, and clouds; a “bring your own tool” culture, supporting a broad range of data processing platforms in the Hadoop ecosystem; replication of datasets for disaster recovery; and data access security.

Photo of Pradeep Bhadani

Pradeep Bhadani

Hotels.com

Pradeep Bhadani is a senior big data engineer at Hotels.com in London, where he builds and manages cloud infrastructure and core services like Apiary. Pradeep has worked in the big data space, building large-scale platforms, for the last seven years.

Photo of Elliot West

Elliot West

Hotels.com

Elliot West is a principal engineer at Hotels.com in London, where he designs tooling and platforms in the big data space. Previously, Elliot worked on Last.fm’s data team, developing services for managing large volumes of music metadata.