My Playwright Tests Kept Failing. Then I Made Them Self-Heal

#playwright #career #webdev #programming

The 30-Minute Ritual I Got Sick Of
Every morning, same routine: check CI, see red builds, open traces, realize the session expired again.
Not a single feature broke. The tests just lost authentication somewhere between step 3 and step 4. My Playwright automation tool was working perfectly—except for the fragile sessions ruining everything.

I'd spend 30 minutes digging through traces trying to figure out when the user got logged out. Was it the token? The cookie? Parallel workers stepping on each other? Nobody knew.

The Problem Nobody Warns You About
When I started to learn Playwright, everyone praised storageState for session management. "Save your auth once, reuse it everywhere!" they said.
What they didn't mention: sessions are fragile.
Tokens expire mid-test. Parallel workers invalidate shared state. Permissions change between runs. And when it happens, Playwright doesn't say "hey, you got logged out."
It says: TimeoutError: Element not found
The real failure—authentication—is completely hidden behind a generic locator error.

The Discovery That Changed Everything
I found this brilliant blog on TestLeaf about using AI to detect and heal session failures automatically. The concept seemed almost too good to be true: tests that diagnose why they failed, not just what failed.
During my Playwright course online, we learned about handling authentication. But nobody taught me how to make tests smart enough to detect when authentication breaks and fix it automatically.

How Self-Healing Actually Works
Instead of just reporting "test failed," I built a system that:

Captures Failure Context When a test fails, grab the high-signal metadata: typescriptinterface FailureContext { finalUrl: string; pageTitle: string; lastNetworkErrors: { url: string; status: number }[]; consoleErrors: string[]; } This small JSON payload is enough for AI to make a diagnosis without sending entire traces (expensive and slow).
AI Classifies the Failure Instead of human guesswork, AI categorizes failures:

Auth/Session Issue (redirects to login, 401/403 responses)
Locator Issue (selector changed, strict mode violation)
Application Bug (500 errors, broken logic)

Example AI response:
"Test failed at Step 4. User was redirected from /orders to /login after a 401 response. Classification: Session Expired."

Intelligent Self-Healing If AI confidently classifies it as a session issue (confidence > 90%), the framework:

Regenerates auth.json
Retries the test with fresh authentication
Logs the healing event for review

Pseudo-code:
typescriptif (testResult.status === 'failed') {
const analysis = await aiAgent.analyze(failureContext);

if (analysis.category === 'AUTH_ISSUE' && analysis.confidence > 0.9) {
console.log('🤖 Session expired. Regenerating auth...');
await globalSetup(); // Fresh login
console.log('🔄 Retrying with new session...');
// Retry test
}
}
The Results Were Shocking
After implementing self-healing:
Session failures dropped 85%. Tests that broke from expired tokens now auto-recover.
Debugging time cut in half. AI tells me why tests failed, not just where.
CI reliability improved dramatically. Red builds now mean actual issues, not session noise.
No more 30-minute trace investigations. AI does the diagnosis instantly.

What I Learned
Don't send full traces to AI. They're massive, expensive, and slow. Extract only the signals that matter (final URL, 401/403 responses, console errors).
Confidence thresholds matter. I only auto-heal when AI is >90% confident. Lower confidence gets flagged for human review.
Security is non-negotiable. Never send raw tokens or credentials to external LLMs. Sanitize everything.
You can start simple. Basic rules catch most cases (URL ends with /login, got 401 response). AI handles the fuzzy edge cases.

My Current Approach
I use a tiered strategy:
Tier 1: Simple rules (if final URL = /login → session failure)
Tier 2: AI classification (for ambiguous cases)
Tier 3: Auto-healing (only when confidence > 90%)
Always: Logging (every healing event gets recorded and reviewed)

The Honest Truth
When I started learning the Playwright automation tool, I thought test failures were just part of the job. "Flaky tests happen," everyone said.
But most "flaky" tests aren't random—they're session issues masquerading as locator problems. And now that I've built self-healing around this insight, my test suite feels intelligent instead of brittle.
The framework doesn't just run tests—it understands when they fail and knows how to fix common issues automatically.
That's the difference between maintenance hell and actually trusting your automation.

Reference: This post was inspired by TestLeaf's guide on self-healing Playwright tests.
Have you dealt with session failures in Playwright? What's been your approach? Share in the comments! 👇

DEV Community

My Playwright Tests Kept Failing. Then I Made Them Self-Heal

Top comments (0)