Nikhil Raj

Posted on Feb 1

Fan-Out in Social Media Feeds Explained from First Principles

#systemdesign #distributedsystems #backend #webperf

A First-Principles Walkthrough for Software Engineers

If you’ve ever opened a social media app and scrolled through a feed, you’ve interacted with one of the most aggressively optimized systems in modern software. It looks harmless. Just a list of posts, ordered by time or relevance. Nothing fancy.

Under the hood, it’s a small war against latency, scale, skew, and human behavior.

This post is about fan-out, one of the core ideas behind how social media feeds are built. We’ll approach it from first principles, assuming you’re a software engineer but not assuming you’ve designed systems for millions or billions of users. No buzzwords first. No architecture flexing. Just careful reasoning.

Start with the simplest possible question

Before talking about fan-out, we need to agree on what problem we’re actually solving.

A social media feed, stripped of branding and algorithms, boils down to this:

When someone creates a post, other people should be able to see it.

That’s the entire feature.

Everything else, likes, ranking, recommendations, notifications, exists because that simple requirement becomes brutally expensive at scale.

Now translate that sentence into system language:

A single write operation must result in many read-visible outcomes.

That transformation from one write to many reads is what we call fan-out.

Fan-out is not a feature. It’s a consequence.

First principles: what reality forces on us

Before choosing designs, we need to understand the shape of the problem space. These are not opinions. These are constraints imposed by how people and computers behave.

Reads massively outnumber writes

In any social system, users read far more than they write. One post might be read thousands, millions, or even hundreds of millions of times. But it is written exactly once.

This asymmetry matters. Any design that makes reads expensive will suffer constantly. Any design that makes writes expensive suffers occasionally.

At scale, systems almost always choose to hurt occasionally rather than constantly.

Social graphs are uneven in a violent way

Most users have a small number of followers. A few have an enormous number. There is no smooth curve here. It’s a cliff.

This means the cost of “show this post to followers” is usually small, but sometimes astronomically large. If your design does not explicitly account for that, it will work beautifully in tests and collapse in production.

Latency is a user emotion, not a metric

From a human perspective, the difference between 100 milliseconds and 300 milliseconds feels negligible. The difference between 300 milliseconds and 1 second feels broken.

Feeds must feel instant. That means feed reads must be fast not on average, but almost always. Tail latency matters more than mean latency.

Storage is cheaper than computation and network hops

Disks are cheap. CPU time and network calls are not. Every additional query, every additional service hop, every additional merge step shows up as latency.

This pushes real systems toward precomputation and caching, even if that means duplicating data.

Hold onto these constraints. Fan-out exists because of them.

The most obvious solution (and why it fails)

The simplest way to build a feed is to compute it when someone asks for it. This approach is usually called fan-out on read.

Here’s how it works conceptually.

When a user opens their feed, the system looks up who they follow. Then it fetches recent posts from each of those users. Then it merges all those posts together, sorts them, ranks them, and returns the result.

Nothing is precomputed. Everything happens on demand.

Why everyone starts here

This approach feels clean. It avoids duplication. Writes are trivial. Storage usage is low. The logic maps directly to the product definition: “show me posts from people I follow.”

If you’re building a prototype or an early-stage product, this often works fine. Which is why it’s so tempting.

Why it collapses at scale

The cost of reading a feed becomes proportional to the number of people you follow. That means a single feed request might fan out into dozens or hundreds of database queries.

Cache misses amplify the problem. Ranking logic adds computation. Network latency stacks up.

The result is unpredictable latency. Some feed requests are fast. Some are painfully slow. And once you add real traffic, the system starts thrashing.

This violates a fundamental rule of scalable systems: read paths must have bounded cost.

Fan-out on read makes reads expensive at the exact moment users care most about performance.

Turning the problem inside out

To fix this, engineers flip the problem.

Instead of computing feeds when users read, compute feeds when users write.

This approach is called fan-out on write.

When a user creates a post, the system looks up their followers and inserts a reference to that post into each follower’s feed ahead of time. When a follower later opens their feed, the system simply reads from a pre-built list.

Why this works so well

Feed reads become simple lookups. No merging. No heavy computation. Latency becomes predictable and low.

This aligns perfectly with the earlier constraint that reads dominate writes. You pay a cost once and benefit many times.

If all users had similar follower counts, this would be the end of the story.

Unfortunately, humans ruin everything.

The celebrity problem

Fan-out on write has a hidden assumption: that the cost of a write is manageable.

For most users, it is. If someone has 200 followers, inserting 200 feed entries is not a big deal.

But some users have millions of followers.

For them, a single post triggers millions of writes, possibly across shards, possibly replicated, possibly queued. Even asynchronously, this creates massive pressure on the system.

This violates another core rule of system design: no single request should have unbounded cost.

So pure fan-out on write fails, just in a different way.

The solution real systems converge on

Large social systems do not pick one approach. They combine them.

This is called a hybrid fan-out model.

The idea is simple but powerful. Treat users differently based on how dangerous they are to your infrastructure.

For users with a manageable number of followers, the system fans out on write. Their posts are pushed directly into follower feeds.

For users with extremely large follower counts, the system does not push their posts. Instead, those posts are pulled into feeds at read time.

This caps the worst-case write cost while preserving fast reads for the majority of users.

Is it elegant? Not really.
Does it survive real traffic? Yes.

That’s the trade-off that matters.

Rethinking what a feed actually is

At this point, it helps to correct a common misconception.

A feed is not a query.

A feed is a materialized view.

Think of it as a precomputed, per-user index of content references. Not the content itself, just pointers to it.

A typical feed entry contains a user ID, a post ID, a timestamp, and metadata used for ranking. The post content lives elsewhere.

This separation allows systems to rebuild feeds, re-rank content, and handle deletions without rewriting everything.

Once you see feeds as cached indexes instead of dynamic queries, fan-out becomes a maintenance problem rather than a computation problem.

Event-driven fan-out is mandatory

One last principle: never do fan-out synchronously.

When a post is created, the system should emit an event. Background workers consume that event and update feeds asynchronously.

This allows retries, backpressure, and recovery. It also accepts an important truth: social media is eventually consistent by nature.

No user can tell whether a post appeared in their feed after 200 milliseconds or after one second. Your infrastructure definitely can.

The mindset that actually matters

If you’re designing a feed system, the specific database or queue technology matters less than the questions you ask.

What is the worst-case fan-out size?
Is the cost of a request bounded?
What happens when workers lag?
Can feeds be rebuilt?
How do deletions and privacy changes propagate?

Fan-out is not about cleverness. It’s about respecting constraints.

A final analogy that actually holds

Fan-out on read is cooking every time you’re hungry.
Fan-out on write is meal prep.

Cooking sounds flexible until you’re serving millions of people at once.

At scale, preparation wins.

The takeaway

Fan-out is not an optimization. It is the unavoidable result of trying to turn one action into many experiences under extreme skew.

Once you understand that feeds are precomputed, per-user views maintained under constant pressure, the rest of social media system design starts to make uncomfortable sense.

And yes, it only gets harder from here.

DEV Community