DEV Community

Cover image for Why your async user-creation endpoint silently produces duplicates
Alan West
Alan West

Posted on

Why your async user-creation endpoint silently produces duplicates

Six hours. That's how long I burned last Tuesday tracking down a bug that only showed up in production. The tests passed. CI was green. Two reviewers approved the PR. But once real traffic hit, our user-creation endpoint started silently producing duplicate rows.

I've seen this exact pattern enough times to know what's going on. The code looked fine. It read fine. Modern async/await, no callback nesting, type-safe. But whoever wrote it (could've been me on a bad day) hadn't actually internalized what await does. And the moment you skip the fundamentals — what the event loop guarantees, what a transaction actually isolates, what "atomic" means — you build a thing that only fails under load.

Here's the bug, the root cause, and a process for catching this class of issue.

The frustrating problem

Picture an endpoint that creates a user if one doesn't already exist:

async function createUserIfMissing(email) {
  const existing = await db.users.findOne({ email });
  if (existing) return existing;

  // Nothing found — safe to insert, right?
  return db.users.create({ email });
}
Enter fullscreen mode Exit fullscreen mode

It reads correct. Tests pass. Reviewers nod. You ship it.

Then marketing sends a campaign email, two thousand people click within the same second, and your DB has 47 rows with the same email address. The on-call rotation is not happy.

Root cause: TOCTOU and what await doesn't do

This is a textbook Time-Of-Check to Time-Of-Use race condition. Request A awaits findOne, sees nothing, and prepares to insert. Before A reaches create, Request B also awaits findOne, also sees nothing, and also prepares to insert. Both win. Both insert.

The misunderstanding here is subtle. await suspends this coroutine, but the event loop happily schedules other work in the meantime. There is no lock. There is no isolation. The check and the use are two separate database round-trips, and between them, anything can happen.

If you've never sat down with the spec for Promises or read what your database actually guarantees about transaction isolation levels, this bug feels like dark magic. Once you have, it's the first thing you look for.

Step 1: Reproduce it reliably

Production-only bugs are infuriating because you can't iterate. Step one is always: get it to fail on demand.

For race conditions, that usually means firing parallel requests at a local instance:

// reproduce.js — fire N parallel requests, count duplicates
const N = 50;
const email = `race-${Date.now()}@test.com`;

await Promise.all(
  Array.from({ length: N }, () =>
    fetch('http://localhost:3000/users', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ email }),
    })
  )
);

const rows = await db.users.findMany({ email });
console.log(`Created ${rows.length} rows (expected 1)`);
Enter fullscreen mode Exit fullscreen mode

If you can reproduce it locally, you've already done 80% of the work. A test suite that only ever fires one request at a time will never catch this.

Step 2: Fix it at the right layer

There are three legitimate fixes, and choosing the wrong layer is its own bug. In rough order of preference:

Option A: Make the database enforce uniqueness

This is the boring, correct answer for almost every case. Add a unique constraint on email, then handle the conflict in code:

async function createUserIfMissing(email) {
  try {
    return await db.users.create({ email });
  } catch (err) {
    // 23505 is Postgres' unique_violation SQLSTATE
    if (err.code === '23505') {
      return db.users.findOne({ email });
    }
    throw err;
  }
}
Enter fullscreen mode Exit fullscreen mode

The constraint is the only thing that can't be raced — it's enforced inside the database by a single authoritative writer. Application-level checks always have a window.

Option B: Use an upsert

Most databases have a single-statement upsert. Postgres calls it ON CONFLICT. This collapses check-and-insert into one atomic operation:

INSERT INTO users (email)
VALUES ($1)
ON CONFLICT (email) DO NOTHING
RETURNING *;
Enter fullscreen mode Exit fullscreen mode

If the row didn't exist, you get it back. If it did, RETURNING is empty and a follow-up SELECT fetches the existing row. Two statements, but no race between them, because the first one is atomic at the row level.

Option C: Application-level locking

Sometimes the operation isn't a single insert and you genuinely need mutual exclusion across several queries. Then you reach for an advisory lock or a Redis-backed mutex. I use this as a last resort — it adds latency and a new failure mode (lock-holder dies, lock leaks, distributed clocks drift).

// Postgres advisory lock keyed off a hash of the email,
// released automatically when the transaction commits or rolls back
await db.query('SELECT pg_advisory_xact_lock(hashtext($1))', [email]);
// Now check + insert is safe inside this transaction
Enter fullscreen mode Exit fullscreen mode

Prevention: a small checklist that pays off

After enough of these incidents I started running every async PR through a four-question filter:

  • Does this read-then-write rely on the read still being true? If yes, the read needs to be inside a transaction with the right isolation level, or the write needs a constraint to catch the violation.
  • Is there any shared mutable state held across an await? Every await is a yield point. Anything else on the event loop can run between the two halves.
  • What does the database actually guarantee here? Read-committed is the default in Postgres and MySQL, and it does not prevent the race above. You'd need serializable isolation or explicit locking.
  • Can I reproduce this with parallel calls? If you can't reproduce it, you can't claim to have fixed it. Add a parallel-request stress check to your integration suite for any endpoint that mutates shared state.

The reason I keep harping on fundamentals is that none of this is exotic. TOCTOU has been a known bug class since the 1970s. Database isolation levels are documented in the SQL standard and Postgres docs. The event loop is described right in the HTML spec. The information is sitting there waiting to be read.

When you (or a code generator, or a junior dev, or you-at-2am) skip those layers, you don't get faster code. You get code that ships with race conditions baked in, and you pay for it at the worst possible moment — under load, on a Friday night, while the on-call engineer is trying to eat dinner.

Learn the boring parts. They save you the six hours.

Top comments (0)