Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Spotify in the cloud: The next evolution of data at Spotify

Josh Baer (Spotify), Alison Gilles (Spotify)
5:25pm6:05pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud, Media, Platform
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data and backend engineers, organizational leaders, and Agile practitioners

Prerequisite knowledge

  • A basic understanding of data processing primitives (how batch processing is different from real-time processing), how cloud hosting is different than on-premises bare-metal hosting of data processing, and data processing offerings in the cloud or in the Apache Hadoop ecosystem

What you'll learn

  • Learn how Spotify has transitioned to the cloud and how these decisions have affected its data processing landscape
  • Explore the tools that comprise Spotify's data processing landscape
  • Discover organizational challenges encountered in the cloud transition and lessons learned

Description

In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify’s technology choices, challenges faced, and the lessons Spotify learned along the way.

Before migrating to Google Cloud Platform (GCP), data processing at Spotify was mainly done in Hadoop with products in the larger Apache Hadoop ecosystem: Hive for ad hoc analysis, MapReduce for daily batch jobs, Storm for real-time processing, and Spark for machine learning. Josh and Alison describe how Spotify’s data processing platform has evolved through the cloud migration to incorporate GCP offerings like BigQuery, Cloud Dataflow, Cloud Pub/Sub, and TensorFlow.

Josh and Alison also explore some of the organizational changes and culture shifts that the cloud migration has brought—training highly skilled engineers, who are used to solving their own problems, in how to problem solve alongside a provider; leveraging Spotify’s relationship with Google as a vendor; and leading an engineering organization through a transition to focus higher up the stack—as well as some of the beneficial and not-so-beneficial changes the company has made during the still-in-progress migration.

Photo of Josh Baer

Josh Baer

Spotify

Josh Baer is a data infrastructure product lead at Spotify, where he is leading the data processing track of Spotify’s migration to Google Cloud Platform. During his time at Spotify, Josh has worked on growing Spotify’s Hadoop footprint from 180 machines to 2,000, enabling everyday real-time processing and providing infrastructure for advanced machine learning tasks.

Photo of Alison Gilles

Alison Gilles

Spotify

Alison Gilles is director of engineering for data infrastructure at Spotify, where she coaches and leads teams in backend services and data infrastructure. Previously, she led engineering teams at nonprofit organizations in education and corporate social responsibility.

Comments on this page are now closed.

Comments

Picture of Josh Baer
Josh Baer | TECHNICAL PRODUCT OWNER
09/29/2017 9:01am EDT

Slides from the talk: https://www.slideshare.net/JoshBaer/spotify-in-the-cloud-an-evolution-of-data-infrastructure-strata-nyc