DEV Community

Kevi
Kevi

Posted on

Stop Guessing What Caused Your Flaky Tests Fail or Pass

Flaky tests don’t fail when you expect them to.
They fail when you least have time.

One moment everything is green, the next your CI pipeline is red — and then, magically, it passes on rerun.

❌ ❌ ✅ → Passed
Enter fullscreen mode Exit fullscreen mode

So… what just happened?

  • Was it a network issue?
  • Timing? State leakage?
  • The classic DOM detached?
  • May be, the fixture didnt return the value?

The Problem: We Only See the Final Outcome

Most test reports show you only the final result, or you install a bunch of plugins that would scrap all the xmls for you to show you multiple tests of same title and you click each one of them to see which might have ran first?

If a test fails twice and passes on the third attempt, all you see is:

TestCheckoutFlow → Rerun
TestCheckoutFlow → Rerun
TestCheckoutFlow → Rerun
TestCheckoutFlow → PASSED
Enter fullscreen mode Exit fullscreen mode

That “pass” hides everything that matters:

  • Why did it fail initially?
  • What changed between attempts?
  • Is this a real bug or just instability?

👉 You’re left guessing or too much time scrolling and finding what would have ran first.

And guessing doesn’t scale — especially in CI and neither you wanna write a log scraper on top of it.


What Flaky Tests Actually Do to Your System

Flaky tests aren’t just annoying. They slowly break your engineering system:

  • You start ignoring failures (“just rerun it”) - coz I dont have time to dig through the logs
  • Confidence in CI drops
  • Real bugs(Uncaught ones) get buried under noise
  • Debugging becomes reactive instead of intentional

Over time, your test suite stops being a safety net and becomes background noise.


The Missing Piece: Attempt-Level Visibility

The root issue is simple:

We don’t see what happened in each retry.

When you use tools like pytest-rerunfailures, retries happen silently. The final result is shown — the journey is lost.

But that journey is where the bug lives.

I had a test that was just consistently failing in re-runs and since I had no control over the data at that point of time, the re-runs simply showed me that there were multiple records created and it failed(But there would not be multiple records in the first one?), so why did it fail in the first place?


A Better Way: Capture Every Attempt

To solve this, we shipped an update to pytest-html-plus that records every attempt of a test, not just the final one.

Instead of:

TestLogin → Passed
Enter fullscreen mode Exit fullscreen mode

You see:

TestLogin  → Passed
 ├── Attempt 1 → Failed (TimeoutError)
 ├── Attempt 2 → Failed (Element not found)
 └── Attempt 3 → Passed
Enter fullscreen mode Exit fullscreen mode

Now you can answer:

  • Is it a timing issue?
  • Is the UI not ready?
  • Is an API inconsistent?

👉 No more guessing. Just evidence.


Real Example: A “Passing” Test That Was Actually Broken

I had a test that looked harmless:

❌ ❌ ❌ → Passed
Enter fullscreen mode Exit fullscreen mode

Normally, I would move on.

But with attempt-level logs, the story changed:

  • Attempt 1 → API response delayed - but created a partial record
  • Attempt 2 → UI rendered partially - but did not find that full record

This wasn’t a passing test.

This was a race condition waiting to hit production.


But Won’t This Make Reports Noisy?

Yes — if done poorly.

Logging every attempt can easily clutter reports, especially in large test suites.

So we designed it to stay clean by default:

  • Attempts are collapsible
  • Logs are grouped per attempt
  • You expand only what you need

👉 You get detail without sacrificing readability.


Plug & Play with Pytest

Getting started is simple:

pip install pytest-html-plus
pytest --html=report.html
Enter fullscreen mode Exit fullscreen mode

Working example

You get:

  • HTML reports with attempt breakdown
  • JSON output for automation
  • Flaky test detection
  • Clear visibility into retries

No major setup. No workflow changes. No third party plugins needed


When This Helps the Most

This is especially useful if:

  • You use retries in CI (pytest-rerunfailures)
  • Tests pass locally but fail in CI
  • You suspect timing, async, or state issues
  • You’ve ever said “it passed on rerun, so it’s fine”

The Mindset Shift

Stop asking:

“Did the test pass?”

Start asking:

“How did the test pass?”

Because that’s where the real bugs hide.


Final Thought

Flaky tests don’t just waste time —
they slowly erode trust in your system.

And once trust is gone,
your tests stop protecting you.

If you can see every attempt,
you can fix the root cause — not just silence the symptom.


If you want to try it out:

👉 [https://github.com/reporterplus/pytest-html-plus / https://pypi.org/project/pytest-html-plus/]

Would love to hear how you’re dealing with flaky tests in your setup.

Top comments (0)