Expedia Group is in the process of migrating its Hadoop infrastructure from a single organization-wide on-premises cluster to large numbers of smaller in-cloud clusters. It has also moved from a centralized operating model, where one team was responsible for the Hadoop platform, to a distributed approach, where infrastructure is owned and operated by the group’s different brands: Hotels.com, Expedia.com, HomeAway.com, etc. This segmentation of data platforms has allowed the company to realize greater agility, resource elasticity, and reduced costs. However, it has generated architectural fragmentation, creating cloud-based data silos that impeded the ability to explore, discover, and share data across the organization.
Pradeep Bhadani and Elliot West describe these technical challenges and the solutions that were developed to provide users with a virtual unified view of the company’s many data lakes. They then offer an overview of Apiary, an open source project that provides a standardized pattern for deploying and operating data lakes that support a federated dataset sharing across accounts, regions, and clouds; a “bring your own tool” culture, supporting a broad range of data processing platforms in the Hadoop ecosystem; replication of datasets for disaster recovery; and data access security.
Elliot West is a principal engineer at Hotels.com in London, where he designs tooling and platforms in the big data space. Previously, Elliot worked on Last.fm’s data team, developing services for managing large volumes of music metadata.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com