DEV Community

Cover image for How Spotify Splits Its Recommendation Systems (And Why It Matters)
Krunal Kanojiya
Krunal Kanojiya

Posted on

How Spotify Splits Its Recommendation Systems (And Why It Matters)

As more companies use AI to make decisions, they face a tricky problem. The same system is asked to do two different jobs at once. This causes issues, especially for big platforms like Spotify.

Spotify learned this the hard way. They built systems to recommend music and podcasts. But those systems had to handle two tasks:

  1. Serve users in real time - Fast, stable, reliable
  2. Run experiments to improve - Flexible, careful, willing to fail

These jobs don't mix well. So Spotify split them apart.

Why Two Jobs Need Two Systems

Your recommendation engine needs to be fast. When a user opens the app, they expect results now. Any delay or crash hurts the experience.

Your experiment system needs to be thorough. It tests ideas, compares results, and learns from failures. Speed matters less than getting the right answer.

When you mix these systems, both suffer:

  • Experiments slow down your app
  • Production bugs mess up your test data
  • Changes become risky
  • Teams step on each other's work

Spotify kept hitting these problems. So they made a choice: separate the systems completely.

What This Split Looks Like

Personalization systems handle live requests. They focus on:

  • Low delay times
  • High uptime
  • Stable performance
  • Serving millions of users

Experiment systems run tests offline. They focus on:

  • Accurate measurements
  • Clear tracking
  • Safe testing
  • Good data quality

When you separate them, each system gets better at its job. Experiments can change often without breaking production. Production stays stable while new ideas get tested.

If an experiment fails, users never see it. If production has an issue, your experiment data stays clean.

How Models Move to Production

Spotify doesn't push models straight to users. First, they go through an evaluation path:

  1. Build a model in the experiment system
  2. Test it against current models
  3. Check the results carefully
  4. Debate if it actually helps
  5. Only then move it to production

This matters more as AI gets harder to understand. Small changes can cause big effects. Often you only find problems after users notice them.

The split gives teams time to think. They can ask hard questions:

  • Did this change actually help?
  • Did it hurt something we didn't measure?
  • Are we sure we understand what happened?

The Real Challenge: Coordination, Not Models

The hard part isn't building better models. It's getting people to work together.

Splitting systems forces teams to agree on:

  • How data moves between systems
  • Who owns what
  • What tools everyone uses
  • How to track and review changes

This takes work. It adds process. It slows things down in places where speed used to feel important.

But that slowdown helps. It creates space to ask if you're building the right thing. It lets teams test ideas without committing to them. It gives clearer signals about what's safe to ship.

Why This Matters More Now

As AI takes on more work, you need better ways to check if it's doing the right thing. Models are black boxes. Small changes can have wide effects.

By keeping experiments separate, Spotify can:

  • Catch issues before users see them
  • Keep a clear record of decisions
  • Roll back changes easily
  • Build trust in what they ship

This isn't just about having a staging environment. It's about building systems that support disagreement, measurement, and learning.

What You Can Take Away

You probably don't run Spotify-scale systems. But the lesson still applies.

Many teams run experiments inside production because it feels simpler. At first, it is. Over time, that simplicity breaks down:

  • Changes get harder to explain
  • Rollbacks become scary
  • You lose confidence in results

Separating experiments from serving takes upfront work. But it pays off:

  • You get clearer answers
  • Teams make better decisions
  • Risk goes down
  • Trust goes up

As AI systems take on more responsibility, these things matter. The infrastructure you build shapes how your team behaves. When systems favor learning over speed, people make better long-term choices.

That might be the most practical lesson here.

Top comments (0)