Our API test suite went from 4–6% flake to 0.3% the week we changed how we generated entity IDs in test fixtures.
I wish I could tell you we solved our flaky test problem with a fancy testing framework or a massive infrastructure rewrite.
We didn't.
The fix was embarrassingly simple.
We changed how we generated IDs.
That's it.
For months, our CI dashboard looked like this:
Run #1 ✅
Run #2 ❌
Run #3 ✅
Run #4 ❌
Run #5 ✅
Same code.
Same tests.
Different results.
Eventually, we measured the problem.
Our API test suite had a 4–6% flake rate.
That meant roughly one out of every twenty pipeline runs failed for no legitimate reason.
Developers stopped trusting the pipeline.
Failed builds were retried.
Warnings were ignored.
And eventually, real bugs started hiding among false positives.
The culprit?
Randomly generated IDs.
The Flake Source: Random ID Collisions
A typical test looked like this:
const customerId =
Math.floor(Math.random() * 1000000);
await createCustomer({
id: customerId
});
Seems harmless.
Until your suite grows.
Eventually:
- Hundreds of tests run.
- Multiple pipelines run simultaneously.
- Parallel workers execute the same code.
- Test retries create additional requests.
Sooner or later:
Customer 48291 already exists
appears.
A retry passes.
Nobody investigates.
The flake count grows.
The Problem Wasn't Just Collisions
Random IDs caused other issues too.
Suppose a test failed.
You found this in the database:
Customer: 839174
Questions immediately followed:
- Which test created it?
- Which pipeline run?
- Which branch?
- Is it safe to delete?
Nobody knew.
The data had no meaning.
It was just another random number.
Randomness Is the Enemy of Debugging
Random values make systems harder to reason about.
If a failure cannot be reproduced easily, it becomes expensive to investigate.
We wanted IDs that were:
- Predictable
- Searchable
- Namespaced
- Easy to clean up
So we introduced a new pattern.
The New Pattern
Instead of:
84738291
we started generating IDs like this:
test-orders-create-001
Or:
pr-482-users-login-003
Or:
ci-198-payments-refund-001
The format became:
<prefix>-<test-name>-<counter>
Every ID suddenly became self-describing.
Why Deterministic IDs Help So Much
Now when a failure happened:
Customer already exists:
pr-482-users-create-001
we immediately knew:
- Which PR created it
- Which test created it
- Which feature it belonged to
Debugging became dramatically easier.
Example Generator
function buildTestId(
namespace,
testName,
sequence
) {
return `${namespace}-${testName}-${sequence}`;
}
Usage:
const customerId =
buildTestId(
'pr-482',
'create-user',
1
);
Result:
pr-482-create-user-1
Simple.
Readable.
Deterministic.
The Biggest Improvement: Test Isolation
This change dramatically improved test isolation.
Previously:
Test A
↓
Creates customer 123
Test B
↓
Creates customer 123
Failure.
Now:
pr-482-test-a-001
pr-482-test-b-001
No collision.
No interference.
No flake.
Parallel API Tests Become Much Safer
Modern CI systems run tests in parallel.
Our pipelines used:
- Multiple workers
- Multiple containers
- Multiple retry attempts
Random IDs weren't enough.
We needed namespaces.
Namespaced IDs
Every test run now receives a namespace.
Examples:
pr-482
build-991
worker-3
IDs become:
pr-482-worker-1-orders-001
pr-482-worker-2-orders-001
pr-482-worker-3-orders-001
Each worker owns its own space.
Parallel execution becomes dramatically safer.
Why Namespaces Matter
Without namespaces:
worker-1 → customer-001
worker-2 → customer-001
Collision.
With namespaces:
worker-1-customer-001
worker-2-customer-001
No collision.
This single change removed a huge percentage of our flaky failures.
The Unexpected Benefit: Easier Debugging
We started seeing logs like:
DELETE customer:
pr-512-login-tests-001
Immediately we knew:
- PR number
- Test suite
- Test case
This sounds small.
It's actually enormous when debugging CI failures.
The Database Cleanup Hook
Deterministic IDs made cleanup incredibly easy.
Previously:
deleteAllTestCustomers();
Terrifying.
What if production-like data exists?
What if another team is using the environment?
Instead we now do:
deleteCustomersByPrefix(
'pr-482'
);
Or:
deleteOrdersByPrefix(
'worker-2'
);
Cleanup Became Surgical
Instead of:
Delete everything.
we do:
Delete only records created
by this test run.
This dramatically reduced accidental interference.
Example Cleanup Hook
afterAll(async () => {
await deleteByPrefix(
process.env.TEST_NAMESPACE
);
});
Simple.
Fast.
Safe.
This Also Helps With Retries
Suppose a pipeline fails halfway through.
The cleanup never executes.
Previously:
Random test data remains forever.
Now:
DELETE
WHERE id LIKE 'pr-482%'
Done.
Entire test runs can be cleaned up with one query.
Our Flake Numbers
Before:
4–6% flaky failures
After:
0.3%
The change wasn't perfect.
But it was transformative.
The One Case Where This Pattern Still Loses
Deterministic IDs don't solve every problem.
They still struggle with:
Truly Global Resources
Examples:
- Unique email addresses
- Shared queues
- Third-party systems
- Rate-limited APIs
Even with namespaced IDs:
pr-482-john@example.com
might still violate constraints.
Our Fallback Strategy
For these scenarios, we combine:
Namespace
+
Timestamp
+
Random Suffix
Example:
pr-482-user-1721728172-x8a3
This preserves:
- Searchability
- Cleanup capabilities
- Collision resistance
while handling globally unique requirements.
The Pattern We Use Today
Our current generator looks like this:
<namespace>
-
<test-suite>
-
<sequence>
Examples:
pr-811-orders-001
pr-811-orders-002
pr-811-payments-001
worker-3-users-004
Every piece of test data becomes:
- Traceable
- Isolated
- Easy to clean up
- Easy to debug
Why This Matters More Than People Think
Flaky tests aren't just annoying.
They slowly destroy trust.
Eventually developers start saying:
Just rerun the pipeline.
That's dangerous.
Because one day:
A real failure looks exactly like another flaky one.
And nobody notices.
Final Thoughts
I spent a long time assuming flaky tests were caused by:
- Infrastructure
- Networks
- Containers
- CI providers
Sometimes they are.
But surprisingly often, they're caused by something much simpler:
Bad test data management.
Changing our ID generation strategy didn't eliminate every flaky test.
But it removed an enormous category of failures almost overnight.
And it made our tests easier to debug, easier to clean up, and safer to run in parallel.
If you're trying to improve reliability in your own suites, spend some time looking at your test data strategy.
You might discover that your biggest source of flake isn't your infrastructure.
It's your IDs.
If you're curious about broader testing metrics beyond flaky failures, here's what we measure for API test coverage:
Because sometimes the biggest improvements come from changing something as small as a string format.
Top comments (0)