Read-Modify-Write isolation in NoSQL, part 2: When the invariant spans multiple aggregates.

#architecture #distributedsystems #nosql #ddd

In part 1 we saw the single-document case, where optimistic locking saves you with a simple version field. Now we cross the line that breaks that comfort.

Let me make it concrete. Your product sells seats, and an organization buys a license capped at 100 seats. Those seats are spread across many Teams, and each Team is its own aggregate with its own lifecycle. You can't stuff the list of all teams into one document: it grows unbounded and violates every aggregate-design instinct you have.

So your invariant is a sum across aggregates:

The sum of seats over all Team aggregates must never exceed 100.

To enforce that on every "add seats to a team" operation, the honest move is to read the current state across every Team and sum it — a fan-out that scans N Team documents and gets slower as the org grows. Painful, yes. But surely reading the real, current teams inside a transaction is at least correct? That intuition is the trap.

Now the operation is a textbook Read → Modify → Write — but watch what each step touches. You read and sum the Team documents as your guard, confirm there's room, then write the 8 seats onto one Team aggregate (Team Alpha, Team Beta — each its own document). The check reads a set of documents; the mutation lands on one of them. And that's where it gets dangerous.

The anomaly: Write Skew

Two requests run concurrently. The teams currently sum to 90.

guard: Σ seats over all Team docs = 90   (max 100)

  Tx A   reads all teams → Σ = 90 → 90+8 ≤ 100 ✅ → writes Team Alpha (+8) → COMMIT
  Tx B   reads all teams → Σ = 90 → 90+8 ≤ 100 ✅ → writes Team Beta  (+8) → COMMIT
         └─ each reads its OWN snapshot — neither sees the other's in-flight
            write. Two DIFFERENT Team docs → no write-write conflict to abort.

  Σ seats across Team aggregates = 106   >   license cap 100
  Nothing collided, nothing was overwritten.
  The invariant dies between the documents. That's Write Skew.

Each transaction read a valid state and made a locally correct decision — yet the real total is now 106, and you've oversold a 100-seat license. The subtle part the trace makes visible: the damage isn't an overwrite. It lives between the documents, because both transactions validated against a 90 that didn't yet include the other's write.

This is Write Skew, and it's exactly where naive optimistic locking gives up. You'd put a version on the Team document — but Alpha and Beta were never contended; they're different documents. The contended truth is the seat invariant itself.

💡 Lost update vs. Write Skew — the line this whole series turns on.
Part 1's bug was an overwrite: two writes hit the same document, one erased the other, and the stored value itself came out wrong (10 instead of 11). This is the opposite. Nothing is overwritten — the two writes land on different documents, both commit cleanly, and every document stays internally consistent. The invariant breaks in the gap between them. Same Read → Modify → Write race, a strictly nastier failure: there's no torn document for a code review, a unit test, or the database to catch.

A precise word on isolation levels

Wrap the whole thing in a native MongoDB multi-document transaction (≥ 4.0 on replica sets, 4.2+ on sharded clusters) and you get snapshot isolation: every transaction sees a consistent point-in-time snapshot of the database. That genuinely eliminates the classic ANSI read anomalies — dirty reads, non-repeatable reads, and even phantoms (the snapshot is frozen, so rows born after it are simply invisible) — plus single-document lost updates.

What it does not give you, on its own, is full serializability — so it can't stop Write Skew. The subtle part most people miss: your clean snapshot isn't enough, because the concurrent transaction is reading its own snapshot — stale relative to yours — and writing based on it. Two consistent snapshots, two valid decisions, one broken invariant.

Here's the mental key most devs skip: the invariant is never encoded as a write conflict. WiredTiger (MongoDB's storage engine) only aborts a transaction when two of them write the same document. Tx A wrote Team Alpha, Tx B wrote Team Beta — different documents, so there's nothing for WiredTiger to abort. The constraint lives in your head and your validation check, not in the data the engine is watching for conflicts. Snapshot isolation protects the bytes; it can't protect a rule it was never told about.

So the fix — whatever you reach for — has to do the one thing the engine never did for you: make the invariant part of what it watches for conflicts. And here's the uncomfortable preview of everything that follows: every way of doing that ends up serializing these writes somehow, and every one sends you a bill.

the rule    Σ(team.seats) ≤ 100              ← the engine never watches this

the move    materialize it into one doc every writer must also touch:
            license = { usedSeats: 90, maxSeats: 100 }
            → the invisible skew becomes a write-write conflict it CAN catch
              (doing it right — and what it costs — is the rest of this series)

The pattern underneath

Strip it back and the lesson is one sentence. Write skew isn't a flaw in transactions, and MongoDB isn't "wrong." Write skew is what happens when an invariant never becomes part of the database's conflict-detection model — a mismatch between the scope you validate (all the Team docs) and the scope the engine watches for conflicts (the one doc you write). The same read scope ≠ write scope gap from the guard above, all the way up.

Hold that sentence — it's the lens for everything that follows. Every cure from here is just a different answer to one question: how do I drag the invariant into something that serializes these writes? And not one of them is free.

Where this leads

So we reach for the reflex everyone tries first: a distributed lock. Freeze the world around the read and the write with Redis, and on paper you're finally serializable. In practice it's a minefield — latency, deadlocks, and a TTL dilemma with no good answer.

That's where things get ugly. Part 3.