Kevi

Posted on Apr 5

Stop Guessing What Caused Your Flaky Tests Fail or Pass

#python #pytest #testing #playwright

Flaky tests don’t fail when you expect them to.
They fail when you least have time.

One moment everything is green, the next your CI pipeline is red — and then, magically, it passes on rerun.

❌ ❌ ✅ → Passed

So… what just happened?

Was it a network issue?
Timing? State leakage?
The classic DOM detached?
May be, the fixture didnt return the value?

The Problem: We Only See the Final Outcome

Most test reports show you only the final result, or you install a bunch of plugins that would scrap all the xmls for you to show you multiple tests of same title and you click each one of them to see which might have ran first?

If a test fails twice and passes on the third attempt, all you see is:

TestCheckoutFlow → Rerun
TestCheckoutFlow → Rerun
TestCheckoutFlow → Rerun
TestCheckoutFlow → PASSED

That “pass” hides everything that matters:

Why did it fail initially?
What changed between attempts?
Is this a real bug or just instability?

👉 You’re left guessing or too much time scrolling and finding what would have ran first.

And guessing doesn’t scale — especially in CI and neither you wanna write a log scraper on top of it.

What Flaky Tests Actually Do to Your System

Flaky tests aren’t just annoying. They slowly break your engineering system:

You start ignoring failures (“just rerun it”) - coz I dont have time to dig through the logs
Confidence in CI drops
Real bugs(Uncaught ones) get buried under noise
Debugging becomes reactive instead of intentional

Over time, your test suite stops being a safety net and becomes background noise.

The Missing Piece: Attempt-Level Visibility

The root issue is simple:

We don’t see what happened in each retry.

When you use tools like pytest-rerunfailures, retries happen silently. The final result is shown — the journey is lost.

But that journey is where the bug lives.

I had a test that was just consistently failing in re-runs and since I had no control over the data at that point of time, the re-runs simply showed me that there were multiple records created and it failed(But there would not be multiple records in the first one?), so why did it fail in the first place?

A Better Way: Capture Every Attempt

To solve this, we shipped an update to pytest-html-plus that records every attempt of a test, not just the final one.

Instead of:

TestLogin → Passed

You see:

TestLogin  → Passed
 ├── Attempt 1 → Failed (TimeoutError)
 ├── Attempt 2 → Failed (Element not found)
 └── Attempt 3 → Passed

Now you can answer:

Is it a timing issue?
Is the UI not ready?
Is an API inconsistent?

👉 No more guessing. Just evidence.

Real Example: A “Passing” Test That Was Actually Broken

I had a test that looked harmless:

❌ ❌ ❌ → Passed

Normally, I would move on.

But with attempt-level logs, the story changed:

Attempt 1 → API response delayed - but created a partial record
Attempt 2 → UI rendered partially - but did not find that full record

This wasn’t a passing test.

This was a race condition waiting to hit production.

But Won’t This Make Reports Noisy?

Yes — if done poorly.

Logging every attempt can easily clutter reports, especially in large test suites.

So we designed it to stay clean by default:

Attempts are collapsible
Logs are grouped per attempt
You expand only what you need

👉 You get detail without sacrificing readability.

Plug & Play with Pytest

Getting started is simple:

pip install pytest-html-plus
pytest --html=report.html

You get:

HTML reports with attempt breakdown
JSON output for automation
Flaky test detection
Clear visibility into retries

No major setup. No workflow changes. No third party plugins needed

When This Helps the Most

This is especially useful if:

You use retries in CI (pytest-rerunfailures)
Tests pass locally but fail in CI
You suspect timing, async, or state issues
You’ve ever said “it passed on rerun, so it’s fine”

The Mindset Shift

Stop asking:

“Did the test pass?”

Start asking:

“How did the test pass?”

Because that’s where the real bugs hide.

Final Thought

Flaky tests don’t just waste time —
they slowly erode trust in your system.

And once trust is gone,
your tests stop protecting you.

If you can see every attempt,
you can fix the root cause — not just silence the symptom.

If you want to try it out:

👉 [https://github.com/reporterplus/pytest-html-plus / https://pypi.org/project/pytest-html-plus/]

Would love to hear how you’re dealing with flaky tests in your setup.

Top comments (1)

Alex Serebriakov • Apr 8

solid post. the part about waiting for network idle is real — SPAs make this way harder than it should be

for static/simpler cases snapapi.pics works great, just pass the URL and get back the screenshot