For two years, our Sponsored Placements service booked limited ad inventory through Redis: a counter in Redis, a Redlock around the decrement, and a TTL key per hold.
It oversold. Not catastrophically — consistently. 40–60 double-booked placements a month, each one a manual refund and an apology email to an advertiser.
The root cause was never one bug. It was the architecture: two sources of truth that could not be made atomic with each other. The count lived in Redis; the ownership lived in SQL. No transaction spans both. The Redlock only ever protected the Redis half.
The one mental shift
SKIP LOCKEDturns a contended table into a concurrent work queue. Instead of every request fighting over one counter, each request grabs different rows and ignores the ones someone else is holding.
FOR UPDATE alone serializes — that's the experience that scares people off SQL locking. FOR UPDATE SKIP LOCKED is the opposite: a transaction that would have blocked instead skips the locked row and takes the next free one.
One row per reservable unit, then:
START TRANSACTION;
SELECT id
FROM inventory_unit
WHERE placement_id = 42
AND (status = 'available'
OR (status = 'held' AND hold_expires_at < NOW(3))) -- self-healing expiry
ORDER BY id
LIMIT 2
FOR UPDATE SKIP LOCKED; -- the whole trick
UPDATE inventory_unit
SET status = 'held', reservation_id = 'uuid', hold_expires_at = NOW(3) + INTERVAL 10 MINUTE
WHERE id IN (1107, 1108);
INSERT INTO reservation (...) VALUES (...);
COMMIT;
Two concurrent requests for the same pool lock different rows. Neither waits. The claim, the hold, and the reservation are one transaction — there is nothing to reconcile because there is nothing else.
The numbers (8 weeks before vs 8 weeks after)
| Metric | Redis + Redlock | MySQL SKIP LOCKED |
|---|---|---|
| Oversells / month | 40–60 | 0 |
| Reservation p95 | 210 ms | 34 ms |
| Reservation p99 | 540 ms | 61 ms |
| Throughput / instance | ~600 RPS | 1,400 RPS |
| Lock-wait timeouts / day | ~900 | <5 |
| Nightly reconciliation | 9–14 min | deleted |
| Redis cluster | 3 nodes | decommissioned |
What made it work (the short version)
-
One row per unit, not a counter. A single counter row +
FOR UPDATEis correct but serial — we measured the cliff at ~600 RPS. -
Self-healing expiry. The claim query also picks up
heldrows pasthold_expires_at, so correctness never depends on a sweeper running on time. (Redis TTL loses holds on failover — async replication.) -
ORDER BY id+ retry on 1213/1205. Deterministic lock order nearly closes the deadlock window; a 3-attempt retry handles the rest. <2 deadlocks/day, all invisible to users. -
READ COMMITTED, notREPEATABLE READ. Gap locks under the MySQL default widened contention — switching cut deadlocks ~70% on its own. - A unique index as the backstop. Even if app logic is wrong, the database refuses to record the same unit sold twice.
When NOT to do this
Row-per-unit explodes for fungible, high-cardinality stock (5M identical SKUs → use a guarded UPDATE ... WHERE available >= qty instead). Flash-sale "1 item, 100k people" still wants a queue in front. And we kept Redis — for caching browse-page counts, where it belongs. Redis for speed, MySQL for truth, never the two confused.
The full write-up has the complete before/after C# handlers, the failover timeline that used to oversell, the index/EXPLAIN work, pool sizing, and a pre-merge checklist:
👉 How We Replaced Redis with MySQL SKIP LOCKED for Inventory Reservation at Scale
Top comments (0)