Fazal Mansuri

Posted on Apr 18

⚠️ Race Conditions in APIs - The Bug You Can’t See

#backend #database #api #security

There's a category of bugs that don't show up in tests. They don't throw errors. They don't crash your server. They just silently corrupt your data at the exact moment two users do the same thing at the same time.

That's a race condition.

And if you've ever seen a user's account balance go wrong, a product's stock count go negative, or a post get double-liked - you've seen one in production.

Let's break down exactly what's happening and how to fix it.

What Is a Race Condition?

A race condition happens when two concurrent operations read shared data, make a decision based on it, and write back — and neither knows the other exists.

The classic example:

Thread A reads stock = 10
Thread B reads stock = 10
Thread A writes stock = 9  (decremented by 1)
Thread B writes stock = 9  (also decremented from 10 — wrong!)

Two orders placed. One decrement applied. You just oversold your inventory.

The reason this is invisible: each individual operation is correct. Thread A did nothing wrong. Thread B did nothing wrong. The bug lives in the gap between the read and the write.

A Real Scenario: The Inventory API

Say you have an endpoint that handles a purchase:

// ❌ NOT safe under concurrent load
func purchaseItem(ctx context.Context, itemID string) error {
    var stock int
    err := db.QueryRow("SELECT stock FROM items WHERE id = $1", itemID).Scan(&stock)
    if err != nil {
        return err
    }

    if stock <= 0 {
        return errors.New("out of stock")
    }

    _, err = db.Exec("UPDATE items SET stock = $1 WHERE id = $2", stock-1, itemID)
    return err
}

This looks fine. In a single-user world, it is fine. But under load — say, a flash sale with 500 concurrent requests — the gap between SELECT and UPDATE is enough for dozens of threads to read the same stock value and all decrement from it simultaneously.

Result: Negative stock. Oversold orders. Angry customers.

Three Ways to Fix It

Fix 1 — Atomic SQL Update (the simplest approach)

The fastest fix is to eliminate the read-then-write pattern entirely and let the database handle the decrement atomically:

// ✅ Safe — single atomic statement
result, err := db.Exec(`
    UPDATE items
    SET stock = stock - 1
    WHERE id = $1 AND stock > 0
`, itemID)

rowsAffected, _ := result.RowsAffected()
if rowsAffected == 0 {
    return errors.New("out of stock or already taken")
}

No read. No gap. The database engine serializes this at the row level. This works well for simple decrements and is the right default when you don't need the old value.

Fix 2 — Pessimistic Locking (SELECT FOR UPDATE)

When you genuinely need to read the value before deciding what to write — for example, checking business rules before an update — use SELECT FOR UPDATE to hold a row lock:

// ✅ Safe — row is locked until transaction commits
tx, _ := db.BeginTx(ctx, nil)
// error handling omitted for brevity
defer tx.Rollback()

var stock int
err := tx.QueryRow(`
    SELECT stock FROM items WHERE id = $1 FOR UPDATE
`, itemID).Scan(&stock)

if stock <= 0 {
    return errors.New("out of stock")
}

tx.Exec("UPDATE items SET stock = $1 WHERE id = $2", stock-1, itemID)
tx.Commit()

FOR UPDATE tells the database: lock this row. Any other transaction that tries to read it with FOR UPDATE will block until this one commits or rolls back. Completely safe — but it creates a queue, so it can become a bottleneck under heavy concurrent traffic.

Fix 3 — Optimistic Locking with CAS (Compare-And-Swap)

Optimistic locking says: don't block anyone. Just verify before writing that the world still looks the way you expected when you read it.

You add a version column to the row, then make your update conditional:

// ✅ Safe — version mismatch causes retry, not corruption
type Item struct {
    ID      string
    Stock   int
    Version int
}

func purchaseWithCAS(ctx context.Context, itemID string) error {
    const maxRetries = 3

    for attempt := 0; attempt < maxRetries; attempt++ {
        var item Item
        db.QueryRow("SELECT id, stock, version FROM items WHERE id = $1", itemID).
            Scan(&item.ID, &item.Stock, &item.Version)

        if item.Stock <= 0 {
            return errors.New("out of stock")
        }

        result, _ := db.Exec(`
            UPDATE items
            SET stock = $1, version = $2
            WHERE id = $3 AND version = $4
        `, item.Stock-1, item.Version+1, item.ID, item.Version)

        rows, _ := result.RowsAffected()
        if rows == 1 {
            return nil // success
        }
        // version mismatch — someone else wrote first, retry
    }
    return errors.New("too many concurrent modifications, try again")
}

If two threads both read version=5, only the first one that writes version=6 wins. The second sees rowsAffected=0 and retries with a fresh read. No locks held, no blocking — just retry on conflict.

When to Use Which Fix

Scenario	Best approach
Simple increment/decrement	Atomic SQL (`UPDATE ... SET x = x - 1`)
Complex business logic before write	Pessimistic locking (`SELECT FOR UPDATE`)
High-read, low-conflict workloads	Optimistic locking (CAS + version)
Distributed systems / no shared DB	Distributed locks (Redis `SET NX`, etc.)

The Distributed Case: Redis to the Rescue

When your API runs across multiple servers and the database lock isn't enough (or too slow), you need a distributed lock. Redis handles this elegantly:

// ✅ Distributed lock using Redis SET NX (set if not exists)
func acquireLock(ctx context.Context, rdb *redis.Client, key string, ttl time.Duration) (bool, error) {
    ok, err := rdb.SetNX(ctx, "lock:"+key, "1", ttl).Result()
    return ok, err
}

func releaseLock(ctx context.Context, rdb *redis.Client, key string) {
    rdb.Del(ctx, "lock:"+key)
}

func purchaseWithRedisLock(ctx context.Context, rdb *redis.Client, db *sql.DB, itemID string) error {
    acquired, err := acquireLock(ctx, rdb, itemID, 5*time.Second)
    if err != nil || !acquired {
        return errors.New("could not acquire lock, try again")
    }
    defer releaseLock(ctx, rdb, itemID)

    // safe to read and write now — only one server can be here at a time
    return purchaseItem(ctx, db, itemID)
}

The SET NX command is atomic in Redis — only one caller gets true. All others get false and back off. The TTL ensures the lock is released even if your server crashes mid-operation.

⚠️ One important caveat: SET NX alone isn't enough for production. Use Redlock (Redis's official distributed locking algorithm) when you need strong guarantees — especially if you have multiple Redis nodes.

Summary: The Mental Model

Race conditions happen in the gap between read and write. The longer that gap, the higher the chance of corruption. Your goal is to shrink or eliminate that gap:

Atomic operations collapse the gap to zero.
Pessimistic locks block others from entering the gap.
Optimistic locks let everyone in but detect and reject stale writes.

There's no universally best answer. Pick the approach that matches your read/write ratio, your tolerance for blocking, and whether your system is distributed.

Key Takeaways

A race condition is not a bug in any single thread — it's a bug in timing between threads.
Any "read, decide, write" pattern is vulnerable unless protected.
The simplest fix is often a single atomic SQL statement — reach for it first.
SELECT FOR UPDATE is your safety net when you need business logic between the read and write.
Optimistic locking with a version column is ideal for high-concurrency, low-conflict scenarios.
In distributed systems, move to Redis-based locking.

Have you been bitten by a race condition in production? How did you track it down? Drop a comment - these bugs have some of the best war stories.

DEV Community