Your pytest retries are lying to you. The hidden cost of --reruns, and the plugin I wrote so I could actually see what my tests were doing.

#testing #python #pytest #playwright

Picture this. A test fails in CI. It's been flaky all week — fails on push, passes when you rerun. So you add --reruns 2 to the pytest command. Now the suite passes. Green build. Ship it.

A week later, the same test fails in production in a way that only happens under load. You go back to look at the build that passed, and the report says... "passed." One line. No context. No hint that the test ever failed before, let alone what it failed with.

This is what pytest looks like to most of us: a final verdict. It's not wrong, exactly — the test did pass, eventually. But "eventually" is hiding the interesting information. Why did it fail the first two times? What error? Was it a race condition? A flaky fixture? A genuine bug that only manifests one in three runs?

pytest doesn't tell you, and by default pytest-rerunfailures doesn't preserve that context in a form you can easily inspect. Add -n auto via xdist and it gets worse — your reports become a collage of retry artifacts spread across worker JSONs, and figuring out which attempt ran first on which worker is its own forensic exercise.

When a test goes fail → fail → pass, I want to see all three attempts. I want to see each error message. I want to see the order. I want to be able to go, "oh, the first two failures were ConnectionError but the third was clean — that's a network flake, I'll mark it as such" — instead of assuming a pass is a pass is a pass.

So I wrote pytest-html-plus.

What it does (just this, for now)

pytest-html-plus hooks into pytest and pytest-rerunfailures and preserves the full retry history in its HTML report. When a test has multiple attempts, you see all of them — passed, failed, errored — with their individual logs, errors, and tracebacks, in the order they ran.

Crucially, it also merges xdist worker JSONs back into a single cohesive story, so even if a test ran across two workers, you see the full chronological attempt history in one place.

Everything else the plugin does — combined XML export, automatic screenshots on failure, markers, email — is secondary. The retry visibility is the reason I built it.

Install and see it
The setup is three commands:

pip install pytest-html-plus pytest-rerunfailures
pytest --reruns 2
open report_output/report.html

That's it. No config file. No conftest changes. No hooks to register. It works with or without xdist. If a test goes through retries, you'll see the full story in the HTML report automatically.

If you want to try it without touching your own suite, there's a live demo report here. The flaky tests on that page show exactly the retry history I'm describing.

Why this matters more than it sounds like

Knowing which attempt failed" sounds like a nice-to-have until you're actually triaging flakes in a 2,000-test suite. Three concrete things it unlocks:

Distinguishing flaky from genuinely broken.
If a test goes fail → pass, that's a flake. If it goes fail → fail → pass there's probably a real bug that just doesn't reproduce deterministically. The attempt count alone is a diagnostic signal, and you lose it in a standard report.

And when a test goes fail → fail → fail, the first failure is often the most diagnostic one. Later attempts can fail for downstream reasons, polluted state left behind by the first failure, for example: so the symptom you see on attempt three may have nothing to do with the original bug. A classic tell is a test that's quietly depending on state from another test that ran before it; you only see it clearly in the first attempt.

Finding correlated flakes. When three tests all fail their first attempt with ConnectionError but pass on retry, you don't have three flaky tests — you have one network issue. The cross-test retry log makes that pattern visible.

Honest CI reports. There's a real difference between "this build passed on the first try" and "this build passed after eight retries across twelve tests." Both show passed in a default pytest run. Both should not be treated the same by a reviewer.

What I didn't build (on purpose)

If you're looking for a full test-management platform with dashboards, trends, or historical tracking across runs — this isn't that. pytest-html-plus is a single-run reporter. It writes one self-contained HTML file per run. That's by design: it's portable, it works without a backend, you can archive it as a CI artifact, you can email it(by just enabling --email flag).

If you want Allure, use Allure — it's a different product solving a different problem. If you want a server with a database tracking flakes across months, that's also a different product. This plugin is for the specific moment when you're triaging a single build and want to see what actually happened.

Try it, tell me it's broken

The project is on GitHub at reporterplus/pytest-html-plus. I maintain it alone. If you install it and something doesn't work on your suite, please open an issue — the most valuable feedback is from people whose pytest setup is weirder than mine. Exotic fixtures, unusual xdist configurations, custom reruns logic — those are the corners where bugs hide, and I can't find them alone.

And if the retry history helps you catch one bug you would have otherwise shipped, that's enough. That's what I wrote it for.