I Fixed 110 Failing E2E Tests in 2 Hours Without Writing a Single Line of Test Code

nklars0 — Sun, 22 Feb 2026 21:09:18 +0000

110 failing Playwright tests. Login flows, multi-step form wizards, search filters, file uploads, complex user workflows. Some failures were missing UI steps. Some were dirty state from previous runs. Some were stale selectors. I fixed all of them in 2 hours. I didn't write a single line of test code.

I built a https://github.com/kaizen-yutani/playwright-autopilot that does it.

How the debugging workflow actually works

When you run a test through the plugin, a lightweight capture hook injects into Playwright's worker process. It monkey-patches BrowserContext._initialize to add an instrumentation listener — no modifications to Playwright's source code, works with any existing installation.

From that point, every browser action is recorded:

DOM snapshots — full ARIA tree of the page captured before and after each click, fill, select, and navigation. When a test fails, you see exactly what the page looked like at the moment of failure, and what it looked like one step before.
Network requests — URL, method, status code, timing, request body, response body. Filter by status (400+ to find failed API calls), by URL pattern, or by method.
Console output — errors, warnings, and logs tied to the specific action that produced them. Not a wall of text — scoped to the step that matters.
Screenshots — captured at the point of failure.

The AI doesn't dump all of this into context at once. It's built on MCP (Model Context Protocol), so it pulls data on demand — action timeline first, then drills into the specific failing step, checks the DOM snapshot, inspects the network response, reads the console. 32 tools, each returning just what's needed. Token-efficient by design.

It thinks in user flows, not selectors

Before touching code, the agent maps the intended user journey: "a user logs in, fills out a multi-step form, uploads a file, submits." It walks through the steps a real user would perform and compares that against what the test actually did.

When a step is missing — a dropdown never selected, a required field never filled, a radio button never clicked — it finds the existing page object method in your codebase and adds the call. No new abstractions. Minimal diff.

It follows your architecture

Page Object Model, business/service layer, whatever pattern your team uses — it reads your codebase and works within it. Uses getByRole(), getByTestId(), web-first assertions. No page.evaluate() hacks, no waitForTimeout, no try/catch around Playwright actions.

If the application itself is broken — 500s regardless of input, unhandled exceptions in app code — it tells you that instead of working around it.

It learns and remembers

After a test passes, the plugin automatically saves the verified user flow — the exact sequence of interactions that make up the happy path. Next time that test breaks, the agent already knows the intended journey and jumps straight to identifying what changed.

Run e2e_build_flows once across your suite and it captures every test's journey. The agent gets faster over time.

A real example

A checkout test was failing with "locator resolved to hidden element." The usual debugging path: open trace viewer, find the step, read the DOM, realize a country dropdown was never selected so the shipping section never rendered. 20 minutes if you're fast.

The plugin found the same root cause in one run. It pulled the DOM snapshot at the failing step, saw the unselected dropdown with its options sitting right there in the ARIA tree, searched the page objects for selectCountry(), found it, added the call in the service layer, re-ran the test. Passed. One fix, 12 seconds of AI thinking.

Get started

Add the marketplace

/plugin marketplace add kaizen-yutani/playwright-autopilot

Install the plugin

/plugin install kaizen-yutani/playwright-autopilot

Then prompt: Fix all failing e2e tests

https://github.com/kaizen-yutani/playwright-autopilot — star it, try it on your flakiest test, tell me what breaks.

DEV Community: nklars0

I Fixed 110 Failing E2E Tests in 2 Hours Without Writing a Single Line of Test Code

Add the marketplace

Install the plugin