DEV Community

Cover image for The Test ID Pattern That Finally Killed Our Flake
Sushant Joshi
Sushant Joshi

Posted on

The Test ID Pattern That Finally Killed Our Flake

Our API test suite went from 4–6% flake to 0.3% the week we changed how we generated entity IDs in test fixtures.

I wish I could tell you we solved our flaky test problem with a fancy testing framework or a massive infrastructure rewrite.

We didn't.

The fix was embarrassingly simple.

We changed how we generated IDs.

That's it.

For months, our CI dashboard looked like this:

Run #1  ✅
Run #2  ❌
Run #3  ✅
Run #4  ❌
Run #5  ✅
Enter fullscreen mode Exit fullscreen mode

Same code.

Same tests.

Different results.

Eventually, we measured the problem.

Our API test suite had a 4–6% flake rate.

That meant roughly one out of every twenty pipeline runs failed for no legitimate reason.

Developers stopped trusting the pipeline.

Failed builds were retried.

Warnings were ignored.

And eventually, real bugs started hiding among false positives.

The culprit?

Randomly generated IDs.


The Flake Source: Random ID Collisions

A typical test looked like this:

const customerId =
  Math.floor(Math.random() * 1000000);

await createCustomer({
  id: customerId
});
Enter fullscreen mode Exit fullscreen mode

Seems harmless.

Until your suite grows.

Eventually:

  • Hundreds of tests run.
  • Multiple pipelines run simultaneously.
  • Parallel workers execute the same code.
  • Test retries create additional requests.

Sooner or later:

Customer 48291 already exists
Enter fullscreen mode Exit fullscreen mode

appears.

A retry passes.

Nobody investigates.

The flake count grows.


The Problem Wasn't Just Collisions

Random IDs caused other issues too.

Suppose a test failed.

You found this in the database:

Customer: 839174
Enter fullscreen mode Exit fullscreen mode

Questions immediately followed:

  • Which test created it?
  • Which pipeline run?
  • Which branch?
  • Is it safe to delete?

Nobody knew.

The data had no meaning.

It was just another random number.


Randomness Is the Enemy of Debugging

Random values make systems harder to reason about.

If a failure cannot be reproduced easily, it becomes expensive to investigate.

We wanted IDs that were:

  • Predictable
  • Searchable
  • Namespaced
  • Easy to clean up

So we introduced a new pattern.


The New Pattern

Instead of:

84738291
Enter fullscreen mode Exit fullscreen mode

we started generating IDs like this:

test-orders-create-001
Enter fullscreen mode Exit fullscreen mode

Or:

pr-482-users-login-003
Enter fullscreen mode Exit fullscreen mode

Or:

ci-198-payments-refund-001
Enter fullscreen mode Exit fullscreen mode

The format became:

<prefix>-<test-name>-<counter>
Enter fullscreen mode Exit fullscreen mode

Every ID suddenly became self-describing.


Why Deterministic IDs Help So Much

Now when a failure happened:

Customer already exists:
pr-482-users-create-001
Enter fullscreen mode Exit fullscreen mode

we immediately knew:

  • Which PR created it
  • Which test created it
  • Which feature it belonged to

Debugging became dramatically easier.


Example Generator

function buildTestId(
  namespace,
  testName,
  sequence
) {
  return `${namespace}-${testName}-${sequence}`;
}
Enter fullscreen mode Exit fullscreen mode

Usage:

const customerId =
  buildTestId(
    'pr-482',
    'create-user',
    1
  );
Enter fullscreen mode Exit fullscreen mode

Result:

pr-482-create-user-1
Enter fullscreen mode Exit fullscreen mode

Simple.

Readable.

Deterministic.


The Biggest Improvement: Test Isolation

This change dramatically improved test isolation.

Previously:

Test A
↓
Creates customer 123

Test B
↓
Creates customer 123
Enter fullscreen mode Exit fullscreen mode

Failure.

Now:

pr-482-test-a-001
pr-482-test-b-001
Enter fullscreen mode Exit fullscreen mode

No collision.

No interference.

No flake.


Parallel API Tests Become Much Safer

Modern CI systems run tests in parallel.

Our pipelines used:

  • Multiple workers
  • Multiple containers
  • Multiple retry attempts

Random IDs weren't enough.

We needed namespaces.


Namespaced IDs

Every test run now receives a namespace.

Examples:

pr-482
build-991
worker-3
Enter fullscreen mode Exit fullscreen mode

IDs become:

pr-482-worker-1-orders-001
pr-482-worker-2-orders-001
pr-482-worker-3-orders-001
Enter fullscreen mode Exit fullscreen mode

Each worker owns its own space.

Parallel execution becomes dramatically safer.


Why Namespaces Matter

Without namespaces:

worker-1 → customer-001
worker-2 → customer-001
Enter fullscreen mode Exit fullscreen mode

Collision.

With namespaces:

worker-1-customer-001
worker-2-customer-001
Enter fullscreen mode Exit fullscreen mode

No collision.

This single change removed a huge percentage of our flaky failures.


The Unexpected Benefit: Easier Debugging

We started seeing logs like:

DELETE customer:
pr-512-login-tests-001
Enter fullscreen mode Exit fullscreen mode

Immediately we knew:

  • PR number
  • Test suite
  • Test case

This sounds small.

It's actually enormous when debugging CI failures.


The Database Cleanup Hook

Deterministic IDs made cleanup incredibly easy.

Previously:

deleteAllTestCustomers();
Enter fullscreen mode Exit fullscreen mode

Terrifying.

What if production-like data exists?

What if another team is using the environment?

Instead we now do:

deleteCustomersByPrefix(
  'pr-482'
);
Enter fullscreen mode Exit fullscreen mode

Or:

deleteOrdersByPrefix(
  'worker-2'
);
Enter fullscreen mode Exit fullscreen mode

Cleanup Became Surgical

Instead of:

Delete everything.
Enter fullscreen mode Exit fullscreen mode

we do:

Delete only records created
by this test run.
Enter fullscreen mode Exit fullscreen mode

This dramatically reduced accidental interference.


Example Cleanup Hook

afterAll(async () => {
  await deleteByPrefix(
    process.env.TEST_NAMESPACE
  );
});
Enter fullscreen mode Exit fullscreen mode

Simple.

Fast.

Safe.


This Also Helps With Retries

Suppose a pipeline fails halfway through.

The cleanup never executes.

Previously:

Random test data remains forever.
Enter fullscreen mode Exit fullscreen mode

Now:

DELETE
WHERE id LIKE 'pr-482%'
Enter fullscreen mode Exit fullscreen mode

Done.

Entire test runs can be cleaned up with one query.


Our Flake Numbers

Before:

4–6% flaky failures
Enter fullscreen mode Exit fullscreen mode

After:

0.3%
Enter fullscreen mode Exit fullscreen mode

The change wasn't perfect.

But it was transformative.


The One Case Where This Pattern Still Loses

Deterministic IDs don't solve every problem.

They still struggle with:

Truly Global Resources

Examples:

  • Unique email addresses
  • Shared queues
  • Third-party systems
  • Rate-limited APIs

Even with namespaced IDs:

pr-482-john@example.com
Enter fullscreen mode Exit fullscreen mode

might still violate constraints.


Our Fallback Strategy

For these scenarios, we combine:

Namespace
+
Timestamp
+
Random Suffix
Enter fullscreen mode Exit fullscreen mode

Example:

pr-482-user-1721728172-x8a3
Enter fullscreen mode Exit fullscreen mode

This preserves:

  • Searchability
  • Cleanup capabilities
  • Collision resistance

while handling globally unique requirements.


The Pattern We Use Today

Our current generator looks like this:

<namespace>
-
<test-suite>
-
<sequence>
Enter fullscreen mode Exit fullscreen mode

Examples:

pr-811-orders-001
pr-811-orders-002
pr-811-payments-001
worker-3-users-004
Enter fullscreen mode Exit fullscreen mode

Every piece of test data becomes:

  • Traceable
  • Isolated
  • Easy to clean up
  • Easy to debug

Why This Matters More Than People Think

Flaky tests aren't just annoying.

They slowly destroy trust.

Eventually developers start saying:

Just rerun the pipeline.
Enter fullscreen mode Exit fullscreen mode

That's dangerous.

Because one day:

A real failure looks exactly like another flaky one.

And nobody notices.


Final Thoughts

I spent a long time assuming flaky tests were caused by:

  • Infrastructure
  • Networks
  • Containers
  • CI providers

Sometimes they are.

But surprisingly often, they're caused by something much simpler:

Bad test data management.

Changing our ID generation strategy didn't eliminate every flaky test.

But it removed an enormous category of failures almost overnight.

And it made our tests easier to debug, easier to clean up, and safer to run in parallel.

If you're trying to improve reliability in your own suites, spend some time looking at your test data strategy.

You might discover that your biggest source of flake isn't your infrastructure.

It's your IDs.

If you're curious about broader testing metrics beyond flaky failures, here's what we measure for API test coverage:

Because sometimes the biggest improvements come from changing something as small as a string format.

Top comments (0)