DEV Community

Cover image for Read Modify Write Is Where NoSQL Concurrency Bugs Begin.
Hugo Vantighem
Hugo Vantighem

Posted on

Read Modify Write Is Where NoSQL Concurrency Bugs Begin.

Part 1 of 3 — the single-document case.

There's a class of bug that every backend engineer ships at least once, usually
without noticing for months. It hides inside the most innocent-looking operation:
read a document, decide something, write it back.

Take a concrete invariant: a team can hold at most 10 seats. To add a seat you
read the team document, count the seats, check count < 10, and write. A textbook
Read → Modify → Write.

Now run it twice at the same instant. Request A reads count = 9, decides "9 < 10,
fine", and writes 10. Request B, a millisecond apart, also read count = 9,
decided "fine", and writes 10. You now have a team that thinks it has 10 seats but
actually granted 11. Neither request did anything wrong on its own. One write
silently erased the premise of the other. This is a lost update, and it's the
core anomaly of the single-document case.

T0   A reads count = 9
T1   B reads count = 9
T2   A writes count = 10   ("9 < 10, fine")
T3   B writes count = 10   ("9 < 10, fine")

Reality:        11 seats granted
Database state: 10
Invariant:      violated, silently
Enter fullscreen mode Exit fullscreen mode

Here's what teams actually reach for, and exactly what each option leaves on the
table.

The fat aggregate (atomic operators)

If you can express the whole mutation as a single atomic operator — $inc,
$push with $slice, or a conditional findAndModify — MongoDB applies it
atomically on the document. There's no read-then-write window, so no lost update.
For invariants that fit a single atomic expression, this is genuinely the right
tool, and you should reach for it first.

The catch: not every invariant fits. The moment your check needs branching ("if
the plan is free and count ≥ 5, reject") you're back to reading, deciding in
application code, and writing — and the window reopens. Embedding related data is
a perfectly good modeling choice; the trap is different. It's the temptation to keep
stretching one document's consistency boundary — folding in unrelated rules just
to keep the write atomic — which is exactly how you end up with 16 MB documents and a saturated network.

Anomaly status: ✅ lost update handled — for the subset of rules expressible as
one atomic op.

The pessimistic lock (Redis)

Grab a distributed lock before the read, release after the write. It works — but
for a single document it's a sledgehammer. You've added a network round-trip, a
brand-new failure mode (the lock service), and a whole class of distributed
coordination failures — lease expiry, lock drift, fencing, split-brain — all to
guard one document the database could have guarded itself.

Anomaly status: ✅ everything — at the cost of latency and distributed coordination
failures. (Part 3 is dedicated to why that bill is steep.)

Optimistic locking (a version field)

Carry a version on the document. Read it, run your logic, then write with a
guard: findAndModify({_id, version: v}, {$set: {...}, $inc: {version: 1}}). If
anyone wrote in between, version moved, your guard matches nothing, and you
retry. This is the clean default for single-document RMW that doesn't fit an
atomic operator — it kills lost update with no external system.

The catch: under contention it's a retry machine. The more concurrent writers, the
more losers re-run their logic, burning CPU and tail latency.

Anomaly status: ✅ lost update — at the cost of app-side retries.

Pray

Bet that two requests never touch the same document in the same millisecond. They
will. Anomaly status: ❌ lost update, in production, at 3 a.m.

The point

For a single document, you're actually well served: atomic operators or optimistic
locking close the gap cleanly, without external machinery. The single-document
case is the easy one.

The real pain begins the instant your invariant spans two documents — a
workspace budget gating a user debit, for example. There, optimistic locking stops
being sufficient: it still guards each document on its own, but it can no longer
guarantee an invariant that lives between them. And a nastier anomaly walks in —
the database stays perfectly "consistent" while your business invariant quietly
dies.

Welcome to write skew. That's part 2.

Top comments (0)