I've seen this in three different companies. The CI pipeline runs 40 times a day. Sometimes it's green. Sometimes it's red for no obvious reason. Re-run it — green. Same commit. Different result.
This is flaky test syndrome, and it almost always has the same root cause: your tests depend on infrastructure that isn't deterministic.
The usual culprits
Tests hitting a live staging API that has rate limits
Tests relying on test data that a previous run modified and didn't clean up
Auth tokens that expire during a long test run
External services that are occasionally just slow or unavailable
Every one of these introduces non-determinism. Your test suite is now a probabilistic system, not a deterministic one. A 10% failure rate means 10% of your engineers' CI time is wasted on re-runs.
The fix: mock your external HTTP dependencies
For any external HTTP service your code calls, replace it with a mock in the test environment. Not a hardcoded stub in your test code — a real HTTP server that returns spec-accurate responses.
If the external service has an OpenAPI spec (most major APIs do), you can have a mock running in under 5 minutes using moqapi.dev. Import the spec, get a hosted mock URL, override the service URL in your CI environment variables.
# GitHub Actions
env:
PAYMENT_API_URL: ${{ secrets.MOCK_PAYMENT_API_URL }}
CRM_API_URL: ${{ secrets.MOCK_CRM_API_URL }}
The mocks never rate-limit you. They're always available. They return exactly what you configure. Your tests become deterministic.
The database piece
For your own database, wrap each integration test in a transaction that rolls back after the test. This keeps test data isolated without requiring database resets between runs. Every major ORM supports this pattern.
What success looks like
A pipeline that fails for one reason: your code has a bug. Not infrastructure flakiness. Not expired tokens. Not rate limits. Just your code.
That's what deterministic tests feel like. Once you've had them, going back is painful.
Top comments (0)