Gabriel Anhaia

Posted on Apr 27

Twitter's Fanout Strategy at Scale: The Trade-Off Most Designs Miss

#architecture #database #tutorial #microservices

Book: System Design Pocket Guide: Interviews
Also by me: Database Playbook
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Imagine a top-followed account posts a tweet. Call it 100M followers, roughly the order of the largest accounts on X today. If you fanned that tweet out to every follower's home timeline at write time, you'd execute 100M list-prepends in the few seconds the post takes to render, and your write tier would saturate.

So nobody does that. Twitter's home timeline at hundreds of millions of users is built on a hybrid: fanout-on-write for normal users, fanout-on-read for celebrities, with a merge step at the edge that hides the seam. Most candidates draw the push-only diagram, the interviewer says "celebrity," and the whiteboard collapses. The hybrid is the answer. The interesting part is the boundary: when does a user become a celebrity, what happens at the cold start, and what does the storage look like underneath.

This post designs the home timeline at scale. Push vs pull, why neither alone works, the celebrity and cold-start problems, the hybrid decision logic, and the storage shape that makes it run.

Push: fanout-on-write

The naive design is push. When user A posts, you immediately write the tweet ID to the home timeline cache of every follower of A. Reads become trivial; the timeline is already materialized, and the read endpoint paginates a list.

A tweets ──> for follower in followers(A):
                 LPUSH timeline:home:{follower} tweet_id
                 LTRIM timeline:home:{follower} 0 800

Reads cost one Redis LRANGE. Writes cost one Redis call per follower. The math is brutal at the average. If A has 200 followers, every tweet triggers 200 writes. Across the platform that's hundreds of thousands of cache writes per second, but the cache absorbs them.

The math gets ugly at the tail. A user with 100M followers pushing a tweet means 100M cache writes. Even at Twitter's level of Redis tuning, that's a multi-minute job. During those minutes, every other tweet from anyone else queues behind it. The write tier would saturate.

This is the celebrity problem, and pure push has no answer for it.

Pull: fanout-on-read

The opposite extreme is pull. When a user opens the app, you fetch the recent tweets from each user they follow, merge them by timestamp, return the top N. No materialization at write time; the timeline is computed on demand.

A reads ──> for followee in followees(A):
                 fetch recent tweets from user:{followee}
            merge sorted by timestamp
            return top 800

Writes are now cheap, one record per tweet. Celebrity tweets cost the same as everyone else's.

Reads, though, are catastrophic. A typical user follows 200 accounts. Every home timeline render fanouts to 200 reads, plus a merge sort. At 500K timeline reads per second across the platform, that's 100M backend reads per second. The graph store (historically FlockDB on sharded MySQL, later replaced by Manhattan-backed services for tweet content and graph workloads) cannot sustain it. P99 timeline latency goes to seconds.

There's a second problem nobody draws on the whiteboard: pull doesn't cache well. Every user's timeline is unique by definition, so you can't share computation across users. You're paying for the merge on every render.

Why hybrid is the only thing that ships

Push is fast to read, brutal at the tail of the follower distribution. Pull is fast to write, brutal on every read. Neither alone scales to hundreds of millions of DAU.

The hybrid is exactly what it sounds like. Most users push, since their follower count is small enough that fanout-on-write is cheap. Celebrities pull, since their follower count is large enough that fanout-on-write would saturate the cache, but they post few enough times that fanout-on-read is bounded.

At read time, you do two things in parallel and merge: fetch the user's pre-materialized home timeline (push), and fetch recent tweets from the celebrities they follow (pull). The merge is sorted by timestamp, top 800 returned, done.

This is what Twitter, Instagram, and Facebook all converged on. Raffi Krikorian's Timelines at Scale talk on InfoQ walks through Twitter's version: Redis for the materialized timeline, plus a separate fetch path for celebrity tweets, merged at the edge.

The interesting questions are at the boundary.

Where's the line between push and pull?

You need a threshold. Pick a follower count above which a user becomes "pull" and below which they stay "push." The number is a budget question.

If you can absorb fanout writes at 10K writes/sec sustained, and your average tweet rate is 10K tweets/sec, then your fanout budget is 1 follower per tweet on average, which is useless. Realistically you provision the cache for 200x that, so the average user fanout is fine.

The threshold is commonly cited as around 10K followers in public discussions of the design, including Krikorian's InfoQ talk and the HighScalability summary of Twitter's timeline architecture. Below that, push. Above, pull. Below 10K, fanout-on-write costs at most 10K cache ops per tweet, which is the budget the cache tier can absorb. Above 10K, the user goes on the celebrity list and their tweets bypass the fanout step.

The clever part is that the threshold is a config, not an architecture. You can move it at runtime. If a user's follower count crosses the line, you flip a flag and tweets from them stop being fanned out. Their old tweets are still in followers' materialized timelines from before the flip, which is fine; they age out naturally.

The cold-start problem nobody draws

There's a gap in the hybrid that most designs skip. When a user joins, or hasn't logged in for a month, their home timeline cache is cold. The push tier hasn't been writing to a key that has no readers (or the cache evicted it).

If they open the app and you do the standard hybrid read (push timeline plus celebrity pull), the push timeline is empty. They see only the celebrity tweets. That's a broken experience.

The fix is to bootstrap the timeline on cold-read. When the home timeline cache is empty for a user, you fall back to a one-time fanout-on-read across their entire followee set, populate the cache, and return the result. The first-render latency is bad (think 1-2 seconds instead of 100-200 ms), but you only pay it once. After that, the push tier keeps the cache warm.

The other piece is what happens when a user follows a new account. Do you backfill their home timeline with that account's recent tweets? The standard answer is yes: a small backfill job pulls the last few hours of tweets and merges them into the new follower's timeline. Without backfill, the user follows someone and sees nothing from them until that account next posts.

The fanout decision logic, in pseudocode

The write path, not production code, but the shape is right:

CELEBRITY_FOLLOWER_THRESHOLD = 10_000
TIMELINE_CACHE_LIMIT = 800

def on_tweet_posted(tweet, author, services):
    services.tweet_store.put(tweet)
    services.celebrity_tweets.add_if_celeb(author, tweet)

    follower_count = services.graph.count_followers(author.id)
    if follower_count > CELEBRITY_FOLLOWER_THRESHOLD:
        return  # celeb path: pull at read time

    fanout_to_followers(tweet, author, services)


def fanout_to_followers(tweet, author, services):
    for follower_id in services.graph.iter_followers(author.id):
        services.timeline_cache.lpush(
            f"timeline:home:{follower_id}", tweet.id
        )
        services.timeline_cache.ltrim(
            f"timeline:home:{follower_id}",
            0,
            TIMELINE_CACHE_LIMIT - 1,
        )


def get_home_timeline(user_id, services, limit=50):
    cached = services.timeline_cache.lrange(
        f"timeline:home:{user_id}", 0, limit * 2 - 1
    )
    if not cached:
        cached = bootstrap_cold_timeline(user_id, services)

    celeb_followees = services.graph.celeb_followees(user_id)
    celeb_tweets = services.celebrity_tweets.recent_for(
        celeb_followees, limit=limit
    )

    push_tweets = services.tweet_store.batch_get(cached)
    merged = sorted(
        push_tweets + celeb_tweets,
        key=lambda t: t.created_at,
        reverse=True,
    )
    return merged[:limit]


def bootstrap_cold_timeline(user_id, services):
    followees = services.graph.followees(user_id)
    tweet_ids = services.tweet_store.recent_from(
        followees, limit=TIMELINE_CACHE_LIMIT
    )
    if tweet_ids:
        services.timeline_cache.rpush(
            f"timeline:home:{user_id}", *tweet_ids
        )
    return tweet_ids

A few decisions buried in this:

The fanout loop is sequential in the sketch but distributed in production. Twitter shards the follower list across many fanout workers, and each worker handles a slice. With 10K followers, 100 workers, and a Redis cluster, the whole fanout finishes in tens of milliseconds.

The celebrity tweets are fetched at read time but they're cached. A separate cache holds "tweets from user X in the last hour" keyed by user ID, populated on celebrity post and expired after a window. That's why pull-on-read scales: most reads hit a small set of cached celebrity timelines, not the database.

The merge happens at the timeline service, not the client. Doing it server-side lets you apply ranking, ad insertion, and personalization in one place.

Storage shape

Three stores carry this. None of them is interchangeable.

Tweet store. The source of truth for tweet content. Twitter built Manhattan, an internally-built distributed key-value store, sharded by tweet ID. One row per tweet, immutable after post. Writes are simple. Reads are batch-get-by-ID, which is what the timeline merge needs.

Social graph. Followers, followees, blocks. Twitter built FlockDB on MySQL for this in the early 2010s; the access pattern is "give me the follower list of user X," which is a range scan on (followee_id, follower_id) sorted by recency. The store has since been deprecated and graph workloads moved to Manhattan-backed services, but the access pattern is unchanged. A graph database isn't necessary; sharded MySQL with the right index handles it at scale.

Timeline cache. A forked Redis cluster tuned for hybrid list and B-tree access, described in HighScalability's writeup of Twitter's Redis usage. It holds the materialized home timeline, capped at ~800 tweet IDs per user, replicated 3x for fault tolerance. The list structure makes LPUSH plus LTRIM cheap, exactly what the fanout path needs.

The split is not accidental. The tweet store optimizes for write durability and batch read by ID. The graph optimizes for adjacency-list traversal. The timeline cache optimizes for ordered list operations. Trying to do all three in one database gets you bad performance on at least two of the three access patterns. I've watched candidates trip into this when they say "let's just put everything in DynamoDB."

What the design actually trades

The hybrid is not free. Compared to pure push, you pay extra read complexity (the merge step) and the cold-start latency. Compared to pure pull, you pay extra write throughput and the materialized cache footprint.

What you get is a system where the write load is bounded: celebrity tweets don't fanout, and normal tweets fanout but their fanout is small. The read load is also bounded; most reads hit the materialized timeline, with only a small celebrity tweet fetch on top.

The operational coupling is what most candidates miss, not the algorithm itself. You now run two paths in production, with two failure modes (cache stampede on hot celebrity, fanout queue backlog under viral load), and you need to monitor both. A pure push or pure pull system has one failure mode that you watch obsessively. A hybrid has two, and they correlate at exactly the worst times: celebrity posts during a cache eviction storm.

That's why the hybrid only makes sense above a certain scale. Below 10M users, pure push is fine and you ship it on a quarter of the engineering budget. Above 100M users, pure push burns down. Twitter's hybrid is the answer for the middle of that range, and it's worth understanding because the same shape (push for the small case, pull for the large, merge at the edge) shows up in news feeds, group chat, search-result ranking, and notifications.

If this was useful

The Twitter timeline is one of the 15 designs walked in System Design Pocket Guide: Interviews, with the same shape as this post: capacity math first, the hybrid decision logic, the cold start, and the boundary cases interviewers chase. The storage decisions across Manhattan, MySQL, and Redis are the kind of pick covered in Database Playbook, which is the book to grab when "what database should I pick" stops being a buzzword question and starts being a real one.