DEV Community

Arnaldo De Lisio
Arnaldo De Lisio

Posted on

I Stopped Using Playwright. Here's What Replaced It.

I stopped writing Playwright tests for integration flows. Not because they stopped working — they still work fine. But once I tried testing with Claude subagents and agent-browser, going back felt like writing jQuery after learning React.

Here's what changed.

What Playwright gets wrong

Playwright was designed for single-user, deterministic UI flows. You write selectors, set up auth fixtures, mock state, and run scripts that click through a fixed sequence. For a simple login-and-checkout flow, it's fine.

But most real apps have multiple roles interacting with shared state. A customer submits a request. An operator reviews it and assigns a specialist. The specialist does work. The customer pays. The operator ships. Each step depends on the previous one, and each actor is a different authenticated user.

In Playwright, this means:

  • Multiple auth state files (one per role)
  • Fixtures that seed the database before each test
  • Selectors that break every time the UI changes
  • Hundreds of lines of boilerplate before you've tested a single real interaction

You end up maintaining a parallel codebase just to describe what users already do naturally.

What agent-browser does instead

agent-browser is a CLI that lets AI agents control a browser via the accessibility tree. Instead of writing page.locator('[data-testid="submit"]').click(), you describe what you want in plain language and the agent figures out how to do it.

No selectors. No brittle CSS paths. If the button exists and has a label, the agent finds it. If the UI changes, the test doesn't break — the agent adapts.

For role isolation, you use Chrome profile directories. One directory per role, logged in once:

mkdir -p ~/.config/google-chrome/app-customer
mkdir -p ~/.config/google-chrome/app-operator
mkdir -p ~/.config/google-chrome/app-specialist
Enter fullscreen mode Exit fullscreen mode
npx agent-browser \
  --profile ~/.config/google-chrome/app-operator \
  --headed \
  open https://yourapp.com/sign-in
Enter fullscreen mode Exit fullscreen mode

The session persists. Every subsequent headless run using --profile picks it up automatically.

On magic links: use yopmail for test accounts — disposable inboxes, no registration, magic links work out of the box. If you hit email rate limits, generate the magic link URL directly via the admin API and navigate to it, no email sent:

curl -X POST https://<project>.supabase.co/auth/v1/admin/generate_link \
  -H "Authorization: Bearer $SERVICE_ROLE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"type":"magiclink","email":"test@example.com"}' \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['action_link'])"
Enter fullscreen mode Exit fullscreen mode

The orchestrator pattern

For multi-role golden path testing, one Claude session acts as an orchestrator. It spawns one subagent at a time, each operating as a specific role. State flows forward through a shared JSON file.

Orchestrator (main Claude session)
  ├── spawn Agent(customer)    → submits request  → writes request_id
  ├── spawn Agent(operator)    → assigns handler  → writes handler_id
  ├── spawn Agent(specialist)  → does work
  ├── spawn Agent(operator)    → reviews + approves
  ├── spawn Agent(customer)    → pays or confirms
  └── spawn Agent(customer)    → leaves feedback
Enter fullscreen mode Exit fullscreen mode

The state file:

{
  "run_id": "run-2026-04-24",
  "current_request_id": null,
  "confirmation_token": null,
  "steps_completed": []
}
Enter fullscreen mode Exit fullscreen mode

Each subagent gets a self-contained prompt with the current state injected. It reports back any new values — IDs, tokens visible in URLs — and the orchestrator writes them before spawning the next agent.

Results are appended to a log file, log-and-continue, never stop on failure:

## [operator] Assign Specialist — PASS
## [specialist] Submit Work — PASS
## [operator] Approve Work — FAIL: approve button not found
## [customer] Confirm Receipt — PASS
Enter fullscreen mode Exit fullscreen mode

The cost argument is dead

The main counterargument to AI-based testing has always been cost. Claude API calls aren't free, Playwright is.

But if you're running Claude Code on a subscription, that argument disappears. Subagents run against your subscription, not per-token billing. A full 10-step golden path run costs nothing extra.

The only remaining case for Playwright is raw speed — milliseconds per test vs minutes for an agent run. That matters if you need tests on every commit in a tight CI loop. For pre-deploy checks, QA runs, or anything not in a sub-second CI pipeline, there's no practical reason to choose Playwright.

What this means in practice

I haven't written a Playwright test in months. The agent tests cover more ground, break less often, and took a fraction of the time to set up. The only thing I gave up is being able to run them on every commit — which, for integration tests covering a 6-role flow, was never realistic anyway.

If you're still writing Playwright tests for multi-role integration flows, try this setup once. You probably won't go back.

Top comments (0)