Gabriel Anhaia

Posted on Jun 13

Design a News Feed: Fan-Out, Ranking, and the Celebrity Problem

#systemdesign #architecture #scalability #interview

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Ninety million list writes for one post, and the fan-out queue just backed up ten minutes deep. That's where the news feed question goes the moment you reach for the obvious design. You've drawn the clean boxes already: user service, post service, a fan-out worker, a Redis feed cache per user. Then the interviewer asks the one thing that breaks it: "A celebrity with 90 million followers posts. What happens?"

That's the moment the question turns from "can you cache a feed" into "do you understand the load." The news feed is the most-asked system design question because it has one clean trap that splits the people who memorized an answer from the people who understood it.

The two questions before any box

Ask these out loud before you draw anything.

1. What's the follower distribution? It's power-law. Most accounts have a few hundred followers. A tiny fraction have tens of millions. The mean follower count tells you nothing. The shape of the tail is the whole problem.

2. What's the read-to-write ratio? Feeds are read-heavy by a wide margin. A user posts a handful of times a day and refreshes the feed dozens of times. That ratio is what makes precomputing the feed worth the write cost, until the celebrity breaks the math.

Those two answers point straight at fan-out, which is the real subject of the question.

Fan-out on write: precompute every feed

On write (push) means: when someone posts, you immediately write that post into every follower's feed list. The feed is precomputed and sitting in cache. Reads are trivial.

def on_post_created(post_id, author_id):
    followers = get_followers(author_id)
    pipe = redis.pipeline()
    for follower_id in followers:
        key = f"feed:{follower_id}"
        pipe.lpush(key, post_id)
        pipe.ltrim(key, 0, 799)  # keep newest 800
    pipe.execute()

The read side is a single list fetch:

def get_feed(user_id, start=0, count=20):
    key = f"feed:{user_id}"
    return redis.lrange(key, start, start + count - 1)

This is fast where it counts. The read path, which runs far more often than the write path, is one cache hit. For the 99% of accounts with normal follower counts, fan-out on write is the right answer and you should say so.

Then it falls apart at the tail. A post from an account with 90M followers means 90M list writes for one action. The fan-out worker queue backs up. Followers at the front of the queue see the post in a second; followers at the back see it ten minutes later. You've spent enormous write amplification to deliver one post, and you did it for a user who is probably asleep and won't open the app for hours.

Fan-out on read: compute the feed at request time

On read (pull) means: store nothing precomputed. When a user opens the app, look up who they follow, pull recent posts from each of those authors, merge, sort, return.

def get_feed_pull(user_id, count=20):
    followees = get_followees(user_id)
    posts = []
    for author_id in followees:
        posts.extend(
            get_recent_posts(author_id, limit=count)
        )
    posts.sort(key=lambda p: p.created_at, reverse=True)
    return posts[:count]

This flips the cost. Writes are cheap, posting touches nothing but the author's own timeline. But reads are expensive, and reads are the common case. A user following 2,000 accounts triggers 2,000 lookups on every feed refresh. Multiply by every active user refreshing constantly and the read path melts.

Pull wins for one specific shape: the celebrity post nobody asked to precompute. You don't fan a 90M-follower post out to 90M lists. You leave it in the author's timeline and pull it on read, once per follower who actually opens the app.

The hybrid: push for the many, pull for the few

Neither pure model survives a real social graph. The answer the interviewer wants is the hybrid, and the rule that drives it.

Pick a follower threshold. Call it 100,000.

Accounts under the threshold: fan-out on write. Their posts get pushed into follower feeds, because the write cost is bounded and the read stays a single cache hit.
Accounts over the threshold (celebrities): fan-out on read. Their posts stay in their own timeline and are merged in when a follower loads the feed.

The feed read becomes two sources merged: the precomputed list from normal-account pushes, plus a live pull of recent posts from the handful of celebrities this user follows.

def get_feed_hybrid(user_id, count=20):
    pushed = redis.lrange(
        f"feed:{user_id}", 0, count - 1
    )
    celeb_ids = get_celebrity_followees(user_id)
    pulled = []
    for cid in celeb_ids:
        pulled.extend(
            get_recent_posts(cid, limit=count)
        )
    merged = merge_by_recency(pushed, pulled)
    return merged[:count]

The number of celebrities any one user follows is small, usually under a few dozen, so the pull side stays cheap. The push side stays bounded because no account over the threshold fans out. That's the trick: each model handles the part of the distribution where its costs stay flat.

Say the threshold out loud and say that it's tunable. The interviewer may push on edge cases. An account that crosses the threshold mid-life needs a one-time backfill decision: do you fan out their existing followers once, or flip them to pull-only going forward? Pull-only is simpler and the honest answer.

Ranking: the feed isn't reverse-chronological anymore

Once the feed is assembled, you have to order it. Pure reverse-chronological is the easy answer and a real product choice, but most large feeds rank. The interviewer often asks what signals you'd score on.

Common ranking signals, scored per candidate post:

Recency. Decays over hours. A post from this morning outranks one from yesterday.
Affinity. How much this user interacts with this author. Past likes, replies, profile visits.
Engagement velocity. How fast the post is gaining likes and replies right now.
Content type match. Does the user tend to engage with video, links, text.
Author quality. Spam and low-quality penalties.

A simple scoring pass looks like this:

def score(post, user, now):
    age_hours = (now - post.created_at) / 3600
    recency = 0.5 ** (age_hours / 12)  # 12h half-life
    affinity = user.affinity.get(post.author_id, 0.1)
    velocity = post.recent_likes / max(age_hours, 0.5)
    return (
        0.4 * recency
        + 0.4 * affinity
        + 0.2 * normalize(velocity)
    )

You don't rank the whole feed. You assemble a candidate set, a few hundred posts from the hybrid step, then score and sort just those. Scoring runs at read time on a bounded set, which keeps it affordable. Mention that bound; ranking the entire history would be the naive trap.

Pagination: don't use OFFSET

The interviewer will ask how a user scrolls past the first page. The wrong answer is offset-based paging.

Offset paging (LIMIT 20 OFFSET 40) breaks on a feed because the feed mutates between requests. New posts arrive at the top while the user scrolls, so page two repeats items from page one or skips items entirely. It also gets slower the deeper you scroll, the database walks and discards every skipped row.

Use cursor-based pagination. The cursor is an opaque pointer to the last item the client saw, usually a timestamp plus a tiebreaker ID.

def get_feed_page(user_id, cursor=None, count=20):
    if cursor is None:
        rows = feed_after(user_id, ts=now(), limit=count)
    else:
        ts, last_id = decode_cursor(cursor)
        rows = feed_before(user_id, ts, last_id, count)
    next_cursor = (
        encode_cursor(rows[-1]) if rows else None
    )
    return {"items": rows, "next_cursor": next_cursor}

The cursor encodes "give me items older than this exact point." New posts arriving at the top don't shift the page boundary, so the user never sees a duplicate or a gap. It also stays fast at any scroll depth because the query seeks straight to the cursor position instead of counting from the start.

For a ranked feed, the cursor carries the rank-page boundary instead of a raw timestamp, and you cache the assembled candidate set per session so scrolling doesn't re-rank from scratch on every page.

The 90-second answer

When they say "design a news feed," say this:

"Two questions first. Follower distribution? I'll assume power-law, most accounts small, a few enormous. Read-to-write ratio? Read-heavy, which is why precomputing feeds is worth it.

Base design is fan-out on write. When a normal account posts, a worker pushes the post ID into each follower's cached feed list. Reads are then a single cache fetch, which is the common path.

That breaks for celebrities. A 90M-follower post can't fan out to 90M lists. So I go hybrid: accounts under a threshold, say 100K followers, use fan-out on write. Accounts over it stay pull-only, their posts live in their own timeline and get merged in at read time. Any user follows only a few celebrities, so the pull side stays cheap.

Read path: fetch the precomputed list, pull recent posts from the few celebrity accounts the user follows, merge by recency. If the product ranks, I assemble a candidate set of a few hundred and score on recency, affinity, and engagement velocity, then sort just that set.

Pagination is cursor-based, never offset, because the feed mutates while the user scrolls. The cursor points at the last item seen so pages don't duplicate or skip.

What it does well: bounded write cost and a one-hit read for the bulk of users, without melting on viral posts. What it doesn't do: strict ordering guarantees across the push and pull sources, which I'd accept for a social feed and reject for anything ordered like a ledger."

That's 90 seconds. It hits both fan-out models, the hybrid threshold, ranking, pagination, and the limits. The interviewer can now push on any branch and you have something coherent to defend.

Follow-ups that catch people

Two that burn candidates more than the rest:

"A celebrity posts, then deletes it 30 seconds later. What happens?" With pull, deletion is trivial, the post is gone from the source timeline and never appears. With push, you'd have to fan out a delete to every feed you wrote to. The hybrid makes celebrity deletes cheap and normal-account deletes a bounded fan-out, which is one more point for the threshold.
"How big is one user's cached feed?" They want an estimate. Store post IDs, not post bodies, so each entry is 8 bytes. Cap at 800 entries, that's ~6.4KB per feed plus list overhead, call it ~10KB. For 100M active feeds that's ~1TB of cache, sharded across a Redis fleet. Knowing you store IDs and hydrate post bodies separately is the senior signal here, never store full posts in the feed list.

What's the worst feed you've seen fall over in production? Drop it in the comments.

If this was useful

The fan-out threshold, the hybrid read path, and the cursor trick are the kind of decisions that separate a "meets bar" loop from a hire. The System Design Pocket Guide: Interviews walks through 15 of these end to end, including the feed, with the same write-path, read-path, and failure-mode lens used here.