Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Spotify in the cloud: The next evolution of data at Spotify

Josh Baer (Spotify), Alison Gilles (Spotify)
5:25pm6:05pm Wednesday, September 27, 2017
Big data and the Cloud, Data Engineering & Architecture
Location: 1A 15/16/17 Level: Intermediate
Secondary topics:  Cloud, Media, Platform
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data and backend engineers, organizational leaders, and Agile practitioners

Prerequisite knowledge

  • A basic understanding of data processing primitives (how batch processing is different from real-time processing), how cloud hosting is different than on-premises bare-metal hosting of data processing, and data processing offerings in the cloud or in the Apache Hadoop ecosystem

What you'll learn

  • Learn how Spotify has transitioned to the cloud and how these decisions have affected its data processing landscape
  • Explore the tools that comprise Spotify's data processing landscape
  • Discover organizational challenges encountered in the cloud transition and lessons learned


In early 2016, Spotify decided that it didn’t want to be in the data center business. The future was the cloud. Josh Baer and Alison Gilles explain what it took to move Spotify to the cloud, covering Spotify’s technology choices, challenges faced, and the lessons Spotify learned along the way.

Before migrating to Google Cloud Platform (GCP), data processing at Spotify was mainly done in Hadoop with products in the larger Apache Hadoop ecosystem: Hive for ad hoc analysis, MapReduce for daily batch jobs, Storm for real-time processing, and Spark for machine learning. Josh and Alison describe how Spotify’s data processing platform has evolved through the cloud migration to incorporate GCP offerings like BigQuery, Cloud Dataflow, Cloud Pub/Sub, and TensorFlow.

Josh and Alison also explore some of the organizational changes and culture shifts that the cloud migration has brought—training highly skilled engineers, who are used to solving their own problems, in how to problem solve alongside a provider; leveraging Spotify’s relationship with Google as a vendor; and leading an engineering organization through a transition to focus higher up the stack—as well as some of the beneficial and not-so-beneficial changes the company has made during the still-in-progress migration.

Photo of Josh Baer

Josh Baer


Josh Baer is the machine learning platform lead at Spotify, building out the tools, processing, and infrastructure for robust ML experiences; enabling teams to leverage ML and AI sustainably in their products, research, and services; and providing a cohesive experience. Previously, Josh led the Hadoop and stream processing teams.

Photo of Alison Gilles

Alison Gilles


Alison Gilles is director of engineering for data infrastructure at Spotify, where she coaches and leads teams in backend services and data infrastructure. Previously, she led engineering teams at nonprofit organizations in education and corporate social responsibility.

Comments on this page are now closed.


Picture of Josh Baer
09/29/2017 9:01am EDT

Slides from the talk: