David Marcelo Petrocelli

Posted on Mar 3

How Spotify Uses Data to Build the Product 713 Million Users Actually Want

#architecture #microservices #dataengineering #spotify

Difficulty Level: 300 - Advanced

TL;DR

Spotify processes 1 trillion+ events per day through 38,000+ active data pipelines — every play, skip, and save is a signal that feeds back into every product decision
Discover Weekly generated 100 billion+ streams in its first 10 years using three ML layers: collaborative filtering, NLP, and audio CNNs — now augmented with LLMs via custom Semantic IDs
Their A/B testing culture runs tens of thousands of experiments/year across 300+ teams, including 520 simultaneous experiments on a single screen — and they measure learning rate (64%), not just win rate (12%)
Backstage, born as Spotify's internal developer portal, catalogs 2,000+ services and 4,000 data pipelines — and is now used by 3,000+ companies as the CNCF standard
The real lesson isn't any single tool: it's the tight coupling between organizational design (squads own their services) and technical design (services are independently deployable)

The 1 Trillion Events Question

Spotify hit 713 million monthly active users in Q3 2025. That number looks impressive in a press release and terrifying in a system design meeting.

Scale alone doesn't explain Spotify's success. What matters is that every one of those events — every play, every skip, every playlist add at 2am — feeds directly into product decisions. Not after a quarterly review. In near real-time.

Most companies collect data and build dashboards. Spotify built a closed loop: user behavior shapes the product, the product generates more behavior, and the cycle compounds over 20 years of iteration. In 2024, Spotify posted its first annual profit: €1.1B on €15.6B in revenue. The closed loop is working.

After years of building data systems for enterprise clients and teaching these patterns at university, I've found that the most common mistake teams make is copying Spotify's tools rather than their discipline. In this article I'll break down the actual mechanisms behind their data pipeline, recommendation engine, experimentation culture, and developer platform — and tell you which patterns you can realistically steal.

Prerequisites

Familiarity with stream processing concepts (Kafka, Pub/Sub, or similar)
Basic understanding of microservices architecture (service decomposition, database-per-service)
Experience with A/B testing fundamentals
Some exposure to ML recommendation systems (collaborative filtering concepts)

What You'll Learn

How Spotify's event pipeline evolved from self-managed Kafka to GCP Pub/Sub at 3 million events/second
Why Discover Weekly uses three separate ML layers and what each one contributes
How their A/B testing culture measures 64% learning rate instead of just win rate
What Backstage is and why 3,000+ companies adopted it after Spotify open-sourced it
Which Spotify patterns scale down to your team — and which ones don't

The Numbers Behind 713 Million Users

The scale numbers aren't just impressive — they explain every architectural decision.

Metric	Value	Context
Monthly Active Users	713M (Q3 2025)	Up from 600M in mid-2024
Premium Subscribers	281M	~42% conversion rate
Annual Revenue	€15.6B (2024)	First profitable year
Music Catalog	100M+ tracks	Grows ~60K tracks/day
Podcasts	~7M titles	Second only to Apple
Events per day	1 trillion+	1,800+ event types
Active data pipelines	38,000+	Hourly + daily scheduled
Production components	Thousands	80%+ fleet-managed
A/B experiments/year	Tens of thousands	300+ teams running tests
Discover Weekly streams (10yr)	100 billion+	56M new discoveries/week

This scale didn't emerge from a grand architectural vision. It's the result of 20+ years of small, data-driven decisions — each one measured, validated, and shipped incrementally.

From Monolith to Thousands of Microservices

Spotify started as a monolithic Python application in 2006. By 2010, the codebase had grown to the point where no single team could understand all of it, and deployments required coordinating across multiple squads.

The migration to microservices wasn't a big-bang rewrite. It was driven by a single organizational principle: each squad should be able to deploy independently, without coordinating with other teams. If a team needed another team's sign-off to ship, something was wrong — either in the service design or the org structure.

The Database-Per-Service Pattern

Each Spotify microservice owns its own data store, chosen for the access patterns of that service:

Cassandra + BigTable: High-speed key-value lookups (user state, session data, real-time features)
PostgreSQL: Transactional data (payments, account management)
Google Cloud Storage: Large objects (audio files, model artifacts)
BigQuery: Analytical queries and data pipelines

By 2023, the number of distinct production components had grown to "thousands" — enough that Spotify needed a new abstraction to manage them: Fleet Management.

Fleet Management: Treating Services as a Fleet

The key insight behind Fleet Management is that individual service owners are blind to fleet-wide patterns. If 300 teams each manage their own dependencies, you get 300 different versions of Log4j in production. You can't patch a critical vulnerability in 9 hours by asking each team to update manually.

Fleet Management flips the model: infrastructure defaults to secure and up-to-date, and teams opt out for exceptions (with documented justification).

The results are concrete:

300,000+ automated changes merged across the fleet in 3 years
7,500 automated changes/week with 75% auto-merged without human review
Log4j vulnerability: patched to 80% of backend services in 9 hours
Framework updates: reach 70% of fleet in under 7 days (previously ~200 days)
95% of Spotify developers report Fleet Management improved software quality

The Data Pipeline: How Every Play Becomes a Signal

Every user interaction at Spotify — a play, a skip, a search, a playlist add — generates an event. Those events are the raw material for every recommendation, every A/B test result, every product decision.

Here's how that data flows:

The Migration from Kafka to GCP Pub/Sub

In 2016-2017, Spotify migrated their event delivery system from self-managed Kafka clusters to Google Cloud Pub/Sub. This wasn't a trivial decision — Kafka was working. But managing Kafka at Spotify's scale required significant operational overhead that distracted from product engineering.

The results after migration:

Peak throughput scaled from 800,000 to 3,000,000 events/second
Half a trillion daily ingested events (70 TB compressed)
Pub/Sub handles 1 trillion requests/day
BigQuery runs 10 million+ queries and scheduled jobs/month

Scio: Spotify's Open-Source Apache Beam API

Spotify developed Scio, a Scala API for Apache Beam, to process billions of events. It handles both batch and streaming workloads, running on either Dataflow (managed) or Flink (lower-latency) depending on requirements.

Every data endpoint in the platform has:

Retention policies: data deleted after defined period
Access controls: squad-level permissions
Lineage tracking: full trace from source event to derived dataset
Quality checks: automated alerts for lateness, failures, anomalies

The 38,000+ active pipelines are orchestrated, monitored, and surfaced through Backstage — so any squad can inspect the health of their data at any time.

Recommendations at Scale: Discover Weekly Deconstructed

Discover Weekly launched in July 2015 with a simple premise: every Monday morning, 30 personalized songs you've never heard before. In 10 years, it generated 100 billion streams and 56 million new artist discoveries every week.

That impact comes from a three-layer ML architecture, each layer catching different signals:

Layer 1: Collaborative Filtering

Collaborative filtering answers the question: who else listens to what you listen to, and what else do they listen to?

Spotify's approach uses Logistic Matrix Factorization (LMF) on implicit feedback — not explicit star ratings, but behavioral signals:

# Simplified: how Spotify weights implicit feedback signals
# Real implementation uses distributed matrix factorization at scale

SIGNAL_WEIGHTS = {
    "stream_complete":    1.0,   # Listened to 80%+ of song
    "save_to_library":    2.5,   # Strong positive signal
    "add_to_playlist":    2.0,   # Strong positive signal
    "stream_partial":     0.5,   # Weak positive signal
    "skip_after_30s":    -0.8,   # Negative signal
    "skip_immediately":  -1.5,   # Strong negative signal
}

def compute_interaction_score(events: list[dict]) -> float:
    """
    Compute a weighted interaction score for a user-track pair.
    Used as input to the matrix factorization model.
    """
    score = 0.0
    for event in events:
        signal_type = event["type"]
        weight = SIGNAL_WEIGHTS.get(signal_type, 0.0)
        score += weight
    return max(0.0, score)  # Clamp to non-negative for LMF

# The factorization produces: user_vector @ item_vector = predicted_preference
# Trained via ALS (Alternating Least Squares) on GCP with billions of interactions

The training runs on Hendrix, Spotify's ML platform (named after Jimi Hendrix). Hendrix uses Ray for distributed training on GCP, serves 600+ ML practitioners, and handles the full lifecycle from prototype to production.

Layer 2: NLP Analysis

NLP fills in gaps where behavioral data is sparse — for new artists, for niche genres, for tracks uploaded last week.

Spotify runs web crawlers across music blogs, review sites, and social platforms to extract how people describe songs and artists. The output: vector embeddings where songs described with similar language cluster together.

A song described as "dreamy, lo-fi, bedroom pop" clusters with other songs sharing those descriptors — even if no user has yet listened to both.

Layer 3: Audio CNNs

For truly new content — songs uploaded with no listening history and no web presence — audio analysis is the only signal available.

Convolutional neural networks analyze spectrograms (visual representations of audio). The model learns to detect: tempo, energy, instrumentation, tonality, rhythm patterns. Songs with similar audio characteristics cluster together regardless of metadata.

The LLM Layer (2024-2025)

In 2024, Spotify added a fourth layer: LLMs for contextual recommendations and the AI DJ feature.

The challenge: LLMs don't know Spotify's catalog of 100M tracks. The solution was Semantic IDs — compact token identifiers derived from collaborative-filtering embeddings, generated via RQ-KMeans. The LLM learns to treat these IDs as vocabulary tokens, effectively learning to "speak Spotify."

Outcomes from live experiments:

4% increase in listening time from preference-tuned recommendations
14% improvement from Llama fine-tuned on Spotify's domain vs. vanilla Llama
70% reduction in tool errors for the AI DJ orchestration system

A/B Testing Culture: How Spotify Ships Without Breaking Things

Most companies say they have an "experimentation culture." Spotify has metrics to back it up.

300+ teams run experiments. The mobile home screen alone hosted 520 experiments in one year across 58 simultaneous teams. Total experiments run: tens of thousands per year.

The architecture behind this starts with their coordination engine, which manages mutual exclusion between experiments. When 58 teams are simultaneously testing changes to the same screen, you need a system that prevents two experiments from conflicting — and that randomly reshuffles user assignments between experiment runs (the "salt machine").

ABBA to Confidence: Three Generations of Experimentation

Spotify's experimentation platform evolved through three generations:

Generation	Era	Capability
ABBA	Early 2010s	Feature flags + basic metrics
Experimentation Platform (EP)	2015-2023	Full orchestration, metrics catalog, coordination
Confidence	2023+	Commercial product, Backstage plugin, APIs

The Metric That Changed Everything

The most important shift in Spotify's experimentation culture wasn't a new platform — it was a new metric: learning rate.

Win rate (the conventional metric) measures what percentage of experiments "succeed." At Spotify, that's ~12%.

Learning rate measures what percentage of experiments produce decision-ready insights — whether the answer is yes, no, or "we need to test something different." That's 64%.

Win rate:      12%  (the experiment confirmed our hypothesis)
Learning rate: 64%  (the experiment gave us actionable information)

This reframe matters enormously for culture. A team that runs 100 experiments and "wins" 12 shouldn't feel like they failed 88% of the time. Every "failed" experiment that disproves a hypothesis saved months of building the wrong thing.

Using Confidence for Feature Flags

Spotify open-sourced and commercialized Confidence in August 2023. It's available as a managed service, a Backstage plugin, or via API. Here's what a basic feature flag + A/B test looks like:

from spotify_confidence import Confidence

# Initialize with your project credentials
client = Confidence(client_secret="your-client-secret")

# Resolve a feature flag for a specific user
flag_value = client.resolve_boolean_flag(
    flag="new-home-layout",
    default_value=False,
    evaluation_context={
        "targeting_key": user_id,
        "country": user_country,
        "platform": "ios",
    }
)

if flag_value:
    render_new_home_layout()
else:
    render_legacy_home_layout()

# Track events for analysis
client.track(
    "home-layout-engaged",
    {"user_id": user_id, "session_duration_s": session_seconds}
)

The Confidence platform handles user assignment, experiment coordination, statistical analysis, and validity checks automatically. Squads see results in real time without writing SQL.

Backstage: The Developer Portal That Escaped Spotify

By 2019, Spotify had a problem that no amount of engineering talent could solve manually: 280+ teams managing thousands of services, datasets, APIs, and pipelines — with no shared understanding of what existed or who owned it.

The answer was an internal project called "System Z." In March 2020, Spotify open-sourced it as Backstage.

What Backstage Manages at Spotify Today

Resource Type	Count
Backend Services	2,000+
Websites	300
Data Pipelines	4,000
Mobile Features	200

The Software Catalog is the source of truth. Every component has a catalog-info.yaml file in its repo:

# catalog-info.yaml
# Every Spotify service has one of these in its repo root
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: discover-weekly-generator
  description: "Weekly batch job generating personalized Discover Weekly playlists"
  annotations:
    github.com/project-slug: spotify/discover-weekly
    backstage.io/techdocs-ref: dir:.
    pagerduty.com/service-id: P2XYZAB
    datadog.com/service-name: discover-weekly-generator
  tags:
    - ml
    - recommendations
    - batch
spec:
  type: service
  lifecycle: production
  owner: recommendations-squad
  system: recommendation-platform
  dependsOn:
    - resource:default/user-feature-store
    - resource:default/track-embedding-store
    - component:default/hendrix-ml-platform

The Scaffolder generates new services from golden path templates — templates that include security scanning, observability hooks, CI/CD pipelines, and Backstage registration by default. The "right way" is the easy way.

Outside Spotify

Five years after open-sourcing, Backstage has:

3,400+ adopting companies (Expedia, American Airlines, Zalando, Netflix, Twilio, Wayfair)
1,600+ open-source contributors
Donated to the CNCF — now the standard for internal developer portals
Evolved into Spotify Portal (enterprise SaaS, GA October 2025)

The Squad Model: What Actually Works

The "Spotify Model" — Squads, Tribes, Chapters, Guilds — is the most imitated and most misunderstood organizational pattern in tech.

Here's what the original 2012 whitepaper actually said:

Unit	Size	Purpose
Squad	6-12 people	Full ownership: design, build, test, release, operate
Tribe	40-150 people	Coordination across squads in same product area
Chapter	6-15 specialists	Craft community within a tribe (e.g., all iOS engineers)
Guild	Any size	Voluntary community of interest across the company

The key principle: "Loosely coupled but tightly aligned." Squads move fast independently, but all move in the same strategic direction.

But here's what Henrik Kniberg himself says now: "Don't copy the Spotify model. That's the opposite of what we intended."

Spotify no longer follows the original model exactly — it evolved constantly. The org chart was always secondary to the autonomy principle: if a squad can't deploy independently, something is wrong in the service design or the org design. Fix whichever is broken.

The technical manifestation of squad autonomy is Conway's Law in reverse: design your organization first, and your service architecture will follow. Spotify's thousands of independently deployable microservices exist because thousands of squads have full ownership of them.

What to Steal (and What to Leave Behind)

Here's what's actually worth taking from Spotify's playbook — and what requires Spotify-level scale to justify:

Pattern	Steal It?	Minimum Scale	Effort
Software Catalog (Backstage)	Yes	10+ teams	Low — free, CNCF standard
Golden path templates (Scaffolder)	Yes	5+ teams	Medium — template once, scale forever
64% learning rate metric	Yes	Any scale	Low — just change what you measure
Feature flags + gradual rollouts	Yes	Any scale	Low — Confidence or LaunchDarkly
Fleet automation for dependencies	Yes	50+ services	Medium — Dependabot + custom automation
Squad autonomy principle	Yes (carefully)	3+ teams	High — org change, not tech change
3-layer recommendation engine	Adapted	10K+ users	High — need data volume to work
GCP Pub/Sub at 3M events/sec	No (yet)	100M+ events/day	Infrastructure complexity not worth it early
Hendrix ML platform	No	100+ ML practitioners	Overkill; use SageMaker/Vertex AI instead

The three questions worth asking your team right now:

Can each team deploy independently, without coordinating with other teams? If no, fix the service design or the team structure — but fix it.
Are you measuring learning rate or just win rate? Every experiment that disproves a bad idea is a win. Build a culture that treats it that way.
Does your internal developer portal make the right thing the easy thing? If developers skip security scanning because setting it up is hard, the problem isn't the developers.

Conclusion

Spotify's data-driven architecture didn't emerge from a whiteboard session or a consulting engagement. It emerged from 20 years of building autonomy into every layer of the organization and letting that autonomy produce the architecture.

The event pipeline processes 1 trillion events a day not because Spotify chose GCP Pub/Sub — it's because 300+ squads each own their data and ship their pipelines without waiting for a central team.

Discover Weekly recommends music that feels personal not because of any single ML breakthrough — it's because a recommendations squad owned that problem for 10 years and had the freedom to experiment every Monday.

Backstage manages 4,000 data pipelines and 2,000 services not because it's technically clever — it's because the alternative (no catalog) gets exponentially more painful as you grow.

The tools are available to any company. Most of them are open source or commercially available today. The discipline is what differentiates Spotify — and that part you have to build yourself.

Resources

Official Documentation

Spotify Engineering Blog — primary source for all technical patterns described here
Spotify Research — 200+ ML and recommendation papers
Backstage.io — open source, free, CNCF graduated
Confidence — Spotify's A/B testing platform, now commercial

Books

Building Microservices by Sam Newman (O'Reilly, 2nd ed. 2021) — covers squad/service alignment
Designing Data-Intensive Applications by Martin Kleppmann (O'Reilly, 2017) — event streaming fundamentals

Key Engineering Blog Posts

Research Papers

Semantic IDs for Generative Search and Recommendation (NeurIPS 2025)
Users' Interests are Multi-faceted: Recommendation Models Should Be Too (WSDM 2023)
Optimizing for the Long-Term Without Delay

Original Squad Model Reference

Scaling Agile @ Spotify by Henrik Kniberg & Anders Ivarsson (2012)

Did you find this article helpful? Follow me for more content on system design, data engineering, and cloud architecture!

DEV Community