Mohammad Hossein Karami

Posted on Jun 1

The Ghost Post: When Users Can't See Their Own Writes

#sql #programming #database

You submit a post, refresh the page — it's gone. A second later, it magically appears. You file a bug report. The engineer investigates and finds... nothing wrong.

This isn't a bug. It's Read-Your-Writes Consistency — one of the most misunderstood distributed systems problems in production today.

What Is Read-Your-Writes Consistency?

Martin Kleppmann defines it precisely in Designing Data-Intensive Applications:

"After a user writes data, they should see their own write in subsequent reads — regardless of which replica serves the request."

In a Leader/Follower Replication setup, writes always go to the Leader. But reads can be served by any Follower — and that Follower might not have caught up yet. The result: a user's own data becomes temporarily invisible to them.

The Problem Visualized

User ──── Write ──▶ Leader
                      │
                      │ (async replication, ~500ms lag)
                      ▼
User ──── Read  ──▶ Follower  ← hasn't synced yet → returns stale data

The data isn't lost. The system is just eventually consistent — and "eventually" is long enough for users to notice.

Four Solutions, Ranked by Practicality

1. Always Read from Leader

The simplest solution — and the worst at scale.

Every read hits the Leader, turning it into a bottleneck
Followers sit idle, wasting your replication investment
Falls apart completely in multi-device scenarios (write on mobile, read on desktop hits a different route)

Avoid this unless your system is tiny.

2. Time Window Routing

After a write, route that user's reads to the Leader for 60 seconds.

Write → mark user session: "read_from_leader until now() + 60s"
Read  → check session flag → route accordingly

The weakness is obvious: what if replication lag exceeds 60 seconds? During heavy load or a network hiccup, you're back to stale reads — and now your window gives false confidence.

3. LSN-Based Routing

The Log Sequence Number (LSN) is the Leader's real-time position in the replication stream. Instead of guessing with time, you track actual replication progress.

Write → Leader returns LSN (e.g., 100423)
        Store: lastWriteLSN = 100423

Read  → Only route to a Replica where currentLSN >= 100423

This is position-aware, not time-aware — a fundamentally more accurate model. PostgreSQL and MySQL both expose LSN/GTID values you can query directly.

4. Commit Token (Oracle BDB Pattern) 🎯

The Leader generates an opaque token after each write. The client holds this token and sends it with every subsequent read. Each Replica checks whether it has processed up to that transaction before serving the response.

This is the most precise and portable solution — it works across heterogeneous systems and doesn't depend on clock synchronization.

Production-Grade Implementation: Redis + LSN-Based Routing

Combining Commit Tokens with Redis gives you the best balance of accuracy and scalability.

// After Write — store commit position with TTL
public async Task SaveCommitPositionAsync(string userId, long lsn)
{
    var key = $"write_lsn:{userId}";
    await _redis.StringSetAsync(key, lsn, TimeSpan.FromMinutes(5));
}

// Before Read — select only a replica that has caught up
public async Task<string> SelectReplicaAsync(string userId)
{
    var key = $"write_lsn:{userId}";
    var lastLsn = await _redis.StringGetAsync(key);

    if (!lastLsn.HasValue)
        return GetAnyReplica(); // no recent write, serve normally

    var requiredLsn = (long)lastLsn;

    foreach (var replica in _replicas)
    {
        var currentLsn = await replica.GetCurrentLsnAsync();
        if (currentLsn >= requiredLsn)
            return replica.ConnectionString;
    }

    // Fallback: no replica ready, hit the Leader
    return _leaderConnectionString;
}

Why Redis?

Sub-millisecond reads — no overhead on your hot path
TTL auto-cleans stale entries (no manual cleanup)
Horizontally scalable alongside your app

Flow summary:

Write  →  Leader  →  Get LSN  →  Store in Redis (userId → LSN, TTL 5min)
Read   →  Fetch LSN from Redis  →  Find Replica with LSN >=  →  Route there
                                                               ↘ fallback: Leader

Why This Matters More Than You Think

If you're ignoring this problem, your users are experiencing it — and blaming your app. Research on UX in distributed systems consistently shows that data disappearing after a user action is one of the highest-trust-eroding experiences possible.

The four solutions form a clear hierarchy:

Always read Leader → simple, doesn't scale
Time Window → better, but fragile under lag
LSN-Based Routing → accurate, requires LSN access
Redis + LSN (Commit Token) → production-ready, scalable, recommended

For any system handling significant concurrent users, the Redis + LSN approach isn't over-engineering — it's the minimum viable guarantee for a trustworthy user experience.

For a working implementation and deeper architectural context, visit blog.mhkarami97.ir/posts/read_write_consistency

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.