What if I told you I tested 5 real production web apps — an AI platform, an e-commerce store, a scheduling tool, an API builder, and a carbon footprint analyzer — with 30 regression tests and 80+ assertions...
And I didn't write a single CSS selector?
No document.querySelector. No .btn-primary. No [data-testid="submit"]. Nothing.
Just plain English.
The Tool: Passmark
Passmark is an open-source Playwright library by Bug0. You describe tests in natural language. AI (Gemini + Claude) executes them.
Here's a real test from my suite:
test("Add product to cart with variant selection", async ({ page }) => {
await runSteps({
page,
userFlow: "Add a product with specific variants to cart",
steps: [
{ description: "\"Navigate to https://demo.vercel.store\" },"
{ description: "\"Click on the first product\" },"
{ description: "\"Select color\", data: { value: \"White\" } },"
{ description: "\"Select size\", data: { value: \"M\" } },"
{ description: "\"Click 'Add to Cart'\","
waitUntil: "Cart shows 1 item" },
],
assertions: [
{ assertion: "Cart contains the product just added" },
{ assertion: "Cart shows correct variant selected" },
{ assertion: "A total price is displayed" },
],
test, expect,
});
});
No selectors. No brittle locators. Just intent.
When the UI changes, the test self-heals. When assertions run, Claude and Gemini both verify independently — if they disagree, a third model arbitrates.
The 5 Apps I Tested
I chose 5 completely different apps to stress-test Passmark across domains:
1. HOCKS AI (My AI Platform)
URL: hocks-ai.web.app | 9 tests
| Test | What it verifies |
|---|---|
| Landing page loads | Branding, CTA buttons, modern design |
| Sign-up form fields | Email, password, submit button present |
| Empty form validation | Error messages on empty submit |
| Invalid email rejection | Proper validation for bad emails |
| Google OAuth UI | Sign in with Google button exists |
| Wrong credentials | Graceful error, stays on login page |
| Chat interface | Message input, responsive layout |
| Navigation | Section switching works, no 404s |
2. Vercel Commerce (E-Commerce)
URL: demo.vercel.store | 7 tests
The hardest tests to write traditionally — cart state, variant selection, search. With Passmark:
test("Cart persists after adding multiple items", async ({ page }) => {
await runUserFlow({
page,
userFlow: "Add multiple products to cart",
steps: `
Navigate to the store.
Click on the first product.
Select any color and size. Add to cart.
Go back to homepage.
Click a different product.
Select variants. Add to cart.
Open the cart.
Verify 2 items in cart.
`,
effort: "high",
});
await assert({
page,
assertion: "Cart contains at least 2 items with a total price",
expect,
});
});
The effort: "high" flag uses Gemini Pro for complex multi-step flows. Game changer.
3. Cal.com (Scheduling)
URL: cal.com | 4 tests
Tested date selection, timezone handling, booking form validation, and time slot display. The AI navigated Cal.com's dynamic calendar UI flawlessly.
4. Hoppscotch (API Platform)
URL: hoppscotch.io | 5 tests
test("Send GET request and view response", async ({ page }) => {
await runSteps({
page,
userFlow: "Send a GET request to a public API",
steps: [
{ description: "Navigate to https://hoppscotch.io" },
{ description: "Clear URL field and type the endpoint",
data: { value: "https://jsonplaceholder.typicode.com/posts/1" } },
{ description: "Click Send",
waitUntil: "Response appears" },
],
assertions: [
{ assertion: "JSON response body is displayed" },
{ assertion: "Response contains userId, id, title fields" },
{ assertion: "Status shows 200" },
],
test, expect,
});
});
Tested method switching (GET → POST), header management, response viewing, and collections sidebar.
5. EcoSense AI (Green Tech)
URL: ecosense-ai.pages.dev | 5 tests
Multi-step quiz flow, step validation, accessibility checks, and mobile responsive testing at custom viewports.
5 Things That Blew My Mind
1. It works across completely different UIs
Firebase app, Next.js app, Vue app — same natural language patterns worked everywhere. "Click the Sign Up button" just works regardless of the framework.
2. Multi-model assertions catch real bugs
Claude passed an assertion. Gemini failed it. Gemini was right — the product variant wasn't actually visible. Single-model testing would have shipped a false positive.
3. Smart waits > sleep timers
// OLD WAY (pray-driven development)
await page.waitForTimeout(3000);
// PASSMARK WAY
waitUntil: "Cart shows 1 item"
Zero flaky tests from timing issues. The AI waits until it confirms the condition with exponential backoff.
4. effort: "high" handles conditional logic
If there's a step about transportation, select any option.
Continue through all available quiz steps.
Standard mode (Gemini Flash) handles linear flows. High effort (Gemini Pro) handles branching and conditional logic.
5. The entire suite has ZERO maintenance burden
UI redesign? Passmark adapts. New button text? Passmark reads the page. Class name changed? Passmark doesn't care.
Full Test Stats
| Metric | Count |
|---|---|
| Apps tested | 5 |
| Test files | 7 |
| Total tests | 30 |
| Assertions | 80+ |
| CSS selectors used | 0 |
| XPath queries | 0 |
data-testid attributes |
0 |
waitForTimeout calls |
0 |
Try It
git clone https://github.com/x-tahosin/breaking-apps-passmark.git
cd breaking-apps-passmark
npm install && npx playwright install chromium
echo "OPENROUTER_API_KEY=your-key" > .env
npm test
GitHub: x-tahosin/breaking-apps-passmark
Passmark: bug0inc/passmark
The future of E2E testing isn't writing better selectors — it's not writing selectors at all.
What's the most flaky test you've ever had to fix? I'd love to hear your horror stories in the comments.
Top comments (0)