The problem
Every deploy — same manual test steps. Login, open the form,
fill the fields, check the result. Over and over.
I wanted to skip the Playwright/Selenium boilerplate and just
paste my existing test cases as plain text.
What I built
qpilot — an AI agent that reads your manual test case and
executes it in a real Chrome browser step by step.
You write this:
- Go to https://myapp.com/login
- Enter email and password
- Click Login
- Verify dashboard is visible
The agent opens Chrome, clicks, fills forms, and reports
pass/fail/warn per step with evidence from the page.
If it hits an OTP or captcha — it pauses and asks you directly.
How it works
- Playwright controls the real Chrome browser
- Each step: snapshot → action → snapshot → report
- Claude Haiku reads the snapshot (ARIA tree) and decides what to click
- Element refs (e.g. e12) are used for precise targeting
- Context window is managed to avoid hitting token limits
Try it
npx qpilot
No code. No config. No Selenium.
Stack
TypeScript, Playwright, Claude Haiku via Anthropic API.
Open source: qpilot
Curious what you think — especially about edge cases
you'd want it to handle.
Top comments (3)
Same insight we had with deep-test — manual test steps as plain text should just work without framework boilerplate. Curious: when the agent hits a step it can't resolve (unseen UI element, captcha, dynamic state), does it fail hard or is there a fallback to human-in-the-loop?
Good question — both.
If an element isn't found or the state is wrong, the agent marks the step fail and moves on (or stops immediately if it's a critical step like login).
If it needs something it can't know — OTP, SMS code, captcha — the run pauses and a dialog appears in the UI. You type the value, the agent continues from the same point.
Human-in-the-loop is built in, but only triggered when actually needed.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.