Sushant Joshi

Posted on Jun 29

The Test ID Pattern That Finally Killed Our Flake

#ai #api #automation #testing

Our API test suite went from 4–6% flake to 0.3% the week we changed how we generated entity IDs in test fixtures.

I wish I could tell you we solved our flaky test problem with a fancy testing framework or a massive infrastructure rewrite.

We didn't.

The fix was embarrassingly simple.

We changed how we generated IDs.

That's it.

For months, our CI dashboard looked like this:

Run #1  ✅
Run #2  ❌
Run #3  ✅
Run #4  ❌
Run #5  ✅

Same code.

Same tests.

Different results.

Eventually, we measured the problem.

Our API test suite had a 4–6% flake rate.

That meant roughly one out of every twenty pipeline runs failed for no legitimate reason.

Developers stopped trusting the pipeline.

Failed builds were retried.

Warnings were ignored.

And eventually, real bugs started hiding among false positives.

The culprit?

Randomly generated IDs.

The Flake Source: Random ID Collisions

A typical test looked like this:

const customerId =
  Math.floor(Math.random() * 1000000);

await createCustomer({
  id: customerId
});

Seems harmless.

Until your suite grows.

Eventually:

Hundreds of tests run.
Multiple pipelines run simultaneously.
Parallel workers execute the same code.
Test retries create additional requests.

Sooner or later:

Customer 48291 already exists

appears.

A retry passes.

Nobody investigates.

The flake count grows.

The Problem Wasn't Just Collisions

Random IDs caused other issues too.

Suppose a test failed.

You found this in the database:

Customer: 839174

Questions immediately followed:

Which test created it?
Which pipeline run?
Which branch?
Is it safe to delete?

Nobody knew.

The data had no meaning.

It was just another random number.

Randomness Is the Enemy of Debugging

Random values make systems harder to reason about.

If a failure cannot be reproduced easily, it becomes expensive to investigate.

We wanted IDs that were:

Predictable
Searchable
Namespaced
Easy to clean up

So we introduced a new pattern.

The New Pattern

Instead of:

84738291

we started generating IDs like this:

test-orders-create-001

Or:

pr-482-users-login-003

Or:

ci-198-payments-refund-001

The format became:

<prefix>-<test-name>-<counter>

Every ID suddenly became self-describing.

Why Deterministic IDs Help So Much

Now when a failure happened:

Customer already exists:
pr-482-users-create-001

we immediately knew:

Which PR created it
Which test created it
Which feature it belonged to

Debugging became dramatically easier.

Example Generator

function buildTestId(
  namespace,
  testName,
  sequence
) {
  return `${namespace}-${testName}-${sequence}`;
}

Usage:

const customerId =
  buildTestId(
    'pr-482',
    'create-user',
    1
  );

Result:

pr-482-create-user-1

Simple.

Readable.

Deterministic.

The Biggest Improvement: Test Isolation

This change dramatically improved test isolation.

Previously:

Test A
↓
Creates customer 123

Test B
↓
Creates customer 123

Failure.

Now:

pr-482-test-a-001
pr-482-test-b-001

No collision.

No interference.

No flake.

Parallel API Tests Become Much Safer

Modern CI systems run tests in parallel.

Our pipelines used:

Multiple workers
Multiple containers
Multiple retry attempts

Random IDs weren't enough.

We needed namespaces.

Namespaced IDs

Every test run now receives a namespace.

Examples:

pr-482
build-991
worker-3

IDs become:

pr-482-worker-1-orders-001
pr-482-worker-2-orders-001
pr-482-worker-3-orders-001

Each worker owns its own space.

Parallel execution becomes dramatically safer.

Why Namespaces Matter

Without namespaces:

worker-1 → customer-001
worker-2 → customer-001

Collision.

With namespaces:

worker-1-customer-001
worker-2-customer-001

No collision.

This single change removed a huge percentage of our flaky failures.

The Unexpected Benefit: Easier Debugging

We started seeing logs like:

DELETE customer:
pr-512-login-tests-001

Immediately we knew:

PR number
Test suite
Test case

This sounds small.

It's actually enormous when debugging CI failures.

The Database Cleanup Hook

Deterministic IDs made cleanup incredibly easy.

Previously:

deleteAllTestCustomers();

Terrifying.

What if production-like data exists?

What if another team is using the environment?

Instead we now do:

deleteCustomersByPrefix(
  'pr-482'
);

Or:

deleteOrdersByPrefix(
  'worker-2'
);

Cleanup Became Surgical

Instead of:

Delete everything.

we do:

Delete only records created
by this test run.

This dramatically reduced accidental interference.

Example Cleanup Hook

afterAll(async () => {
  await deleteByPrefix(
    process.env.TEST_NAMESPACE
  );
});

Simple.

Fast.

Safe.

This Also Helps With Retries

Suppose a pipeline fails halfway through.

The cleanup never executes.

Previously:

Random test data remains forever.

Now:

DELETE
WHERE id LIKE 'pr-482%'

Done.

Entire test runs can be cleaned up with one query.

Our Flake Numbers

Before:

4–6% flaky failures

After:

0.3%

The change wasn't perfect.

But it was transformative.

The One Case Where This Pattern Still Loses

Deterministic IDs don't solve every problem.

They still struggle with:

Truly Global Resources

Examples:

Unique email addresses
Shared queues
Third-party systems
Rate-limited APIs

Even with namespaced IDs:

pr-482-john@example.com

might still violate constraints.

Our Fallback Strategy

For these scenarios, we combine:

Namespace
+
Timestamp
+
Random Suffix

Example:

pr-482-user-1721728172-x8a3

This preserves:

Searchability
Cleanup capabilities
Collision resistance

while handling globally unique requirements.

The Pattern We Use Today

Our current generator looks like this:

<namespace>
-
<test-suite>
-
<sequence>

Examples:

pr-811-orders-001
pr-811-orders-002
pr-811-payments-001
worker-3-users-004

Every piece of test data becomes:

Traceable
Isolated
Easy to clean up
Easy to debug

Why This Matters More Than People Think

Flaky tests aren't just annoying.

They slowly destroy trust.

Eventually developers start saying:

Just rerun the pipeline.

That's dangerous.

Because one day:

A real failure looks exactly like another flaky one.

And nobody notices.

Final Thoughts

I spent a long time assuming flaky tests were caused by:

Infrastructure
Networks
Containers
CI providers

Sometimes they are.

But surprisingly often, they're caused by something much simpler:

Bad test data management.

Changing our ID generation strategy didn't eliminate every flaky test.

But it removed an enormous category of failures almost overnight.

And it made our tests easier to debug, easier to clean up, and safer to run in parallel.

If you're trying to improve reliability in your own suites, spend some time looking at your test data strategy.

You might discover that your biggest source of flake isn't your infrastructure.

It's your IDs.

If you're curious about broader testing metrics beyond flaky failures, here's what we measure for API test coverage:

Because sometimes the biggest improvements come from changing something as small as a string format.

DEV Community

The Test ID Pattern That Finally Killed Our Flake

The Flake Source: Random ID Collisions

The Problem Wasn't Just Collisions

Randomness Is the Enemy of Debugging

The New Pattern

Why Deterministic IDs Help So Much

Example Generator

The Biggest Improvement: Test Isolation

Parallel API Tests Become Much Safer

Namespaced IDs

Why Namespaces Matter

The Unexpected Benefit: Easier Debugging

The Database Cleanup Hook

Cleanup Became Surgical

Example Cleanup Hook

This Also Helps With Retries

Our Flake Numbers

The One Case Where This Pattern Still Loses

Truly Global Resources

Our Fallback Strategy

The Pattern We Use Today

Why This Matters More Than People Think

Final Thoughts

Top comments (0)