You know the ones:
๐ They pass.
โ Then they fail.
โ
Then magically pass again.
All without a code change.
๐คฏ Why are they so dangerous?
- They break trust in your test suite
- Waste hours of debugging
- Block CI/CD pipelines
- Lead to โIgnore this test, it always fails ๐โ
๐ Real-world causes Iโve seen:
- Timing issues Example: wait_for_element() didnโt wait long enough in slow environments
- Data dependency A test passes locally but fails in CI because shared data is already modified
- 3rd-party services API mock wasnโt stable, external service had rate-limiting
- Environment drift Works on Chrome vX, fails on vX+1
- Bad Locator Find several web elements (not unique)
- Fails only when run in parallel
โ
How to fight flakiness:
๐น Use explicit waits, not hard sleeps
๐น Reset or isolate test data
๐น Add retries ONLY where absolutely needed (@flaky, rerunfailures) - here need to be very carefully
๐น Log and analyze fail patterns over time
๐น Isolate flaky tests and flag them for review
๐น Use mocks to simulate unstable services
๐น Run tests in consistent, clean environments (Docker, CI agents)
๐ฌ A flaky test is worse than no test at all - because it lies to you.
Top comments (0)