Year: 2010–2015 · Crisis: "How do we make the timeline load this fast?"
The Problem: Lady Gaga and 50 Million Followers
Here's the technical challenge Twitter faced in the 2010s:
- When a user tweets, that tweet should appear in all their followers' timelines
- An average user has 200 followers
- But Lady Gaga has 50 million followers
- If Lady Gaga tweets, 50 million timelines need to be updated
If Twitter created a separate database row for each follower:
-- Naive approach: One row per follower
INSERT INTO timeline (user_id, tweet_id, author_id, created_at)
SELECT follower_id, 12345, 'ladygaga', NOW()
FROM followers
WHERE followee_id = 'ladygaga';
-- 50 million INSERTs — disaster!
This approach is impossible. 50 million INSERTs take minutes and lock the database.
Architectural Decision: Fan-out-on-Write vs. Fan-out-on-Read
Twitter developed two different strategies and used them as a hybrid.
Strategy 1: Fan-out-on-Write
When a user tweets, the tweet is written to all followers' timelines at that moment.
USER A tweeted
│
▼
┌─────────────────┐
│ Timeline Service│
│ (at write time) │
└────────┬────────┘
│
┌────┴────┬────────┬────────┬─────────┐
▼ ▼ ▼ ▼ ▼
Follower1 Follower2 Follower3 ... FollowerN
timeline timeline timeline timeline
Pros:
- Reads (viewing the timeline) are very fast — just fetch the user's timeline
- Simple architecture
Cons:
- Writing is extremely slow for users with many followers
- Storage explosion: Each tweet is stored N times
Strategy 2: Fan-out-on-Read
When a user opens their timeline, tweets from the people they follow are merged at that moment.
USER opened their timeline
│
▼
┌──────────────────┐
│ Timeline Service │
│ (at read time) │
└────────┬─────────┘
│
┌────┴────┬────────┬────────┐
▼ ▼ ▼ ▼
Author1 Author2 Author3 Author4
tweets tweets tweets tweets
│
└──> MERGE ──> Show to user
Pros:
- Writing is very fast — just store the tweet
- Storage efficient — Each tweet is stored once
Cons:
- Reading is very slow — N queries per timeline view
- The merge operation is expensive
Twitter's Hybrid Solution
Twitter split users into two categories:
- Normal users (< 10,000 followers): Fan-out-on-Write
- Celebrity users (> 10,000 followers): Fan-out-on-Read
# Twitter's hybrid approach (pseudo-code)
def post_tweet(user, tweet_text):
tweet = create_tweet(user, tweet_text)
if user.follower_count < 10_000:
# Normal user: Fan-out-on-Write
followers = get_followers(user.id)
for follower in followers:
redis.zadd(
f"timeline:{follower.id}",
tweet.timestamp,
tweet.id
)
else:
# Celebrity: Only store their own tweet
# Will be merged when followers open their timeline
redis.zadd(f"user_tweets:{user.id}", tweet.timestamp, tweet.id)
def get_timeline(user_id):
# 1. Get the user's pre-computed timeline
timeline = redis.zrevrange(f"timeline:{user_id}", 0, 100)
# 2. Add tweets from followed celebrities
for celeb in get_followed_celebrities(user_id):
celeb_tweets = redis.zrevrange(
f"user_tweets:{celeb.id}", 0, 10
)
timeline.extend(celeb_tweets)
# 3. Sort by time
return sorted(timeline, key=lambda t: t.timestamp, reverse=True)[:100]
Manhattan: Twitter's Own Database
In 2014, Twitter migrated from MySQL to their own Manhattan database. Manhattan is a distributed key-value store designed for Twitter's specific needs:
- Multi-datacenter replication: Tweets are replicated across multiple data centers worldwide
- Low latency: <10ms read latency at the 99th percentile
- High throughput: Millions of tweets per second
Snowflake: Twitter's ID Generation System
Tweet IDs aren't random. Twitter uses an ID generation system called Snowflake:
┌────────────────────────────────────────────────────────────┐
│ 64-bit Tweet ID (Snowflake) │
├────────────────────────────────────────────────────────────┤
│ Bit 63: Sign bit (always 0) │
│ Bit 22-62: Timestamp (41 bits — ms since custom epoch) │
│ Bit 17-21: Datacenter ID (5 bits → 32 datacenters) │
│ Bit 12-16: Worker ID (5 bits → 32 workers per DC) │
│ Bit 0-11: Sequence number (12 bits → 4096 per ms/worker) │
└────────────────────────────────────────────────────────────┘
Why this kind of ID?
- Decentralized: Each worker generates its own IDs, no coordination needed
- Time-ordered: Tweets are naturally sorted chronologically
- Unique: Collisions are impossible (each worker uses a different ID block)
- Compact: Much shorter than UUIDs (64-bit vs. 128-bit)
Trade-offs
✅ Gains:
- Low latency: Timelines load within milliseconds
- Scale: Billions of tweets and users supported
- Cost efficiency: Storage and write load optimized for celebrity users
❌ Costs:
- Architectural complexity: Managing two different strategies is hard
- Inconsistency: Celebrity tweets may appear a few seconds late in followers' timelines
- Threshold management: Determining and dynamically adjusting the "10,000 follower" threshold is difficult
🛠️ Takeaways
Twitter showed us there's no such thing as "one size fits all" — they used a hybrid approach instead of a single strategy. In feed, timeline, and notification systems, the read vs. write trade-off will always come up; analyze which one is more frequent and choose your strategy accordingly. Centralized ID generation (auto-increment) becomes a bottleneck in distributed systems; look into Snowflake, ULID, UUID v7 as alternatives. And for frequently read data like timelines, in-memory caches like Redis are vital.
Next up — Chapter 6: Spotify's Squad Model and how Golden Paths cut service creation from 2 weeks to 5 minutes. 🎵
Top comments (0)