I've been skeptical of AI testing tools for a while, but not for the reasons most people are.
My problem isn't that AI can't drive a browser. It clearly can. My problem is what happens after.
Every tool I tried made the same implicit trade: the AI stays in the loop at runtime. Your "test" is really a prompt that gets re-evaluated every time CI runs. The model drifts, the response changes slightly, your test starts flaking — and you have no idea why because there's no diff to look at. You just have vibes and a red build.
I kept thinking: I don't want AI to run my tests. I want AI to write them.
The thing Playwright Codegen almost got right
Playwright's built-in codegen is underrated. You record your actions, it spits out a .spec.ts file, and that file is yours forever. No model, no API key, no ongoing cost. Just code.
The problem is it records mechanically. It doesn't think. It'll happily record click('[data-testid="btn-3"]') instead of getByRole('button', { name: 'Sign in' }). It can't handle conditional flows. It doesn't know what you're actually trying to verify.
What I wanted was: Playwright Codegen, but the thing doing the recording understands what it's looking at.
Connecting to the browser I'm already using
The first real decision was how to connect to the browser.
Every AI browser tool I'd seen launches its own headless Chromium. Clean, isolated, reproducible. Also completely useless for how I actually develop.
When I'm debugging a login flow at 11pm, I'm already logged in. I have three tabs open. I have the React DevTools panel where I need it. I have the page in exactly the state that's causing the problem. Launching a fresh browser throws all of that away.
So I went the other direction: connect to the Chrome that's already running, over CDP. connectOverCDP, find the tab whose URL matches the dev server, and work from there.
This turned out to matter more than I expected. The agent has access to real auth state. It can operate the page in the actual condition you'd be testing. It's not pretending to be a user — it's operating as one.
The sandbox problem
Giving an AI agent access to your browser is a little terrifying if you think about it too hard.
I spent more time on the sandbox than on anything else. The agent only gets one tool: the Playwright MCP server. Bash, Write, Read, WebFetch — all explicitly denied. The only write path is the spec file that gets emitted when you click Save. There's also a hard --max-budget-usd ceiling per session.
The mental model I settled on: the agent is a very capable intern who's allowed to operate the browser and nothing else. They can't touch the filesystem. They can't make network calls. They definitely can't run shell commands.
A typical 5-step session comes out around $0.10.
The thing I didn't plan for
I didn't intend for "no new API key" to be a feature. It just fell out of the architecture.
Hover doesn't bundle any AI runtime. It scans PATH for whatever coding-agent CLI you already have — claude, codex, cursor-agent. If you have Claude Code installed, that's your agent. Nothing to configure.
The first time someone tried it and said "wait, I don't have to set anything up?" I realized this was more important than I'd thought. Most developers I know have at least one agent CLI already installed. The friction of yet another API key, yet another account, is small but real — and apparently enough to make people bounce.
What actually comes out the other side
Add one line to vite.config.ts:
import { hover } from '@hyperyond/vite-plugin'
export default defineConfig({
plugins: [react(), hover()],
})
Start your dev server. A floating widget appears in your dev page. Type what you want to verify:
"Log in as the test user, add two items to the cart, and check out"
The agent drives your actual Chrome. When it looks right, click Save. You get this:
import { test, expect } from '@playwright/test'
/**
* Steps:
* 1. Navigate to the login page
* 2. Fill in username and password
* 3. Click Sign in
* 4. Add two items to cart
* 5. Complete checkout
*
* Expected: Order confirmation page shows "Thank you"
*/
test('checkout flow', async ({ page }) => {
await page.goto('http://localhost:5173/')
await page.getByLabel('Username').fill('testuser')
await page.getByLabel('Password').fill('password123')
await page.getByRole('button', { name: 'Sign in' }).click()
// ...
await expect(page.getByText('Thank you')).toBeVisible()
})
No imports from Hover. No runtime dependency on anything I built. Run it with npx playwright test on a machine that's never heard of Hover.
That's the whole point. The agent authors the test. Then it's done. The artifact is yours.
Where it gets interesting: a 50-field form
The demo that convinced me this approach actually works is a realistic brokerage account registration form — ~50 fields, conditional reveals, multi-select chips, file upload, compliance acknowledgements.
The agent handled most of it unprompted. It did get stopped by three required radio groups the form's own validator caught. Hover surfaces a card explaining why it paused — the human flips those three radios and re-runs.
I think that's actually the right split: agent handles what it can see and reason about, human handles the judgment calls.
One session, three outputs
When a session goes well, the Save dropdown gives you three options:
📜 Save as spec → __vibe_tests__/login-flow.spec.ts
Standard Playwright. Runs in CI. No agent required.
💾 Save as Skill → .claude/skills/login-flow/SKILL.md
Next time you say "execute login-flow" in the widget, it replays the same steps. Useful for skipping repetitive setup while building a different part of the app.
📋 Save as Jira case → __vibe_tests__/login-flow.case.csv
Xray-compatible CSV. Drag it into Jira, it becomes a real Manual Test issue — assignable, linkable to a story. No copy-pasting steps from a code editor into a ticket ever again.
What I'd do differently
The CDP setup is friction I haven't solved. You have to launch Chrome in debug mode manually before starting. It's one command, but it's one command people forget.
Same-origin navigation occasionally kills the widget mid-session. It auto-resumes on reload, but still annoying. This is the thing I most want to fix next.
The project is called Hover. Apache-2.0, works with any Vite app.
If you've run into the same frustration with AI-in-the-loop testing, I'd be curious whether this framing resonates — or whether I'm solving the wrong problem entirely.
Top comments (0)