DEV Community

S M Tahosin
S M Tahosin

Posted on • Originally published at tahosin.hashnode.dev

I Wrote 30 E2E Tests for 5 Production Apps Using Only English — No Selectors, No XPath, No Flaky Tests

What if I told you I tested 5 real production web apps — an AI platform, an e-commerce store, a scheduling tool, an API builder, and a carbon footprint analyzer — with 30 regression tests and 80+ assertions...

And I didn't write a single CSS selector?

No document.querySelector. No .btn-primary. No [data-testid="submit"]. Nothing.

Just plain English.

AI Testing Assertions Selectors


The Tool: Passmark

Passmark is an open-source Playwright library by Bug0. You describe tests in natural language. AI (Gemini + Claude) executes them.

Here's a real test from my suite:

test("Add product to cart with variant selection", async ({ page }) => {
  await runSteps({
    page,
    userFlow: "Add a product with specific variants to cart",
    steps: [
      { description: "\"Navigate to https://demo.vercel.store\" },"
      { description: "\"Click on the first product\" },"
      { description: "\"Select color\", data: { value: \"White\" } },"
      { description: "\"Select size\", data: { value: \"M\" } },"
      { description: "\"Click 'Add to Cart'\","
        waitUntil: "Cart shows 1 item" },
    ],
    assertions: [
      { assertion: "Cart contains the product just added" },
      { assertion: "Cart shows correct variant selected" },
      { assertion: "A total price is displayed" },
    ],
    test, expect,
  });
});
Enter fullscreen mode Exit fullscreen mode

No selectors. No brittle locators. Just intent.

When the UI changes, the test self-heals. When assertions run, Claude and Gemini both verify independently — if they disagree, a third model arbitrates.


The 5 Apps I Tested

I chose 5 completely different apps to stress-test Passmark across domains:

1. HOCKS AI (My AI Platform)

URL: hocks-ai.web.app | 9 tests

Test What it verifies
Landing page loads Branding, CTA buttons, modern design
Sign-up form fields Email, password, submit button present
Empty form validation Error messages on empty submit
Invalid email rejection Proper validation for bad emails
Google OAuth UI Sign in with Google button exists
Wrong credentials Graceful error, stays on login page
Chat interface Message input, responsive layout
Navigation Section switching works, no 404s

2. Vercel Commerce (E-Commerce)

URL: demo.vercel.store | 7 tests

The hardest tests to write traditionally — cart state, variant selection, search. With Passmark:

test("Cart persists after adding multiple items", async ({ page }) => {
  await runUserFlow({
    page,
    userFlow: "Add multiple products to cart",
    steps: `
      Navigate to the store.
      Click on the first product.
      Select any color and size. Add to cart.
      Go back to homepage.
      Click a different product.
      Select variants. Add to cart.
      Open the cart.
      Verify 2 items in cart.
    `,
    effort: "high",
  });

  await assert({
    page,
    assertion: "Cart contains at least 2 items with a total price",
    expect,
  });
});
Enter fullscreen mode Exit fullscreen mode

The effort: "high" flag uses Gemini Pro for complex multi-step flows. Game changer.

3. Cal.com (Scheduling)

URL: cal.com | 4 tests

Tested date selection, timezone handling, booking form validation, and time slot display. The AI navigated Cal.com's dynamic calendar UI flawlessly.

4. Hoppscotch (API Platform)

URL: hoppscotch.io | 5 tests

test("Send GET request and view response", async ({ page }) => {
  await runSteps({
    page,
    userFlow: "Send a GET request to a public API",
    steps: [
      { description: "Navigate to https://hoppscotch.io" },
      { description: "Clear URL field and type the endpoint",
        data: { value: "https://jsonplaceholder.typicode.com/posts/1" } },
      { description: "Click Send",
        waitUntil: "Response appears" },
    ],
    assertions: [
      { assertion: "JSON response body is displayed" },
      { assertion: "Response contains userId, id, title fields" },
      { assertion: "Status shows 200" },
    ],
    test, expect,
  });
});
Enter fullscreen mode Exit fullscreen mode

Tested method switching (GET → POST), header management, response viewing, and collections sidebar.

5. EcoSense AI (Green Tech)

URL: ecosense-ai.pages.dev | 5 tests

Multi-step quiz flow, step validation, accessibility checks, and mobile responsive testing at custom viewports.


5 Things That Blew My Mind

1. It works across completely different UIs

Firebase app, Next.js app, Vue app — same natural language patterns worked everywhere. "Click the Sign Up button" just works regardless of the framework.

2. Multi-model assertions catch real bugs

Claude passed an assertion. Gemini failed it. Gemini was right — the product variant wasn't actually visible. Single-model testing would have shipped a false positive.

3. Smart waits > sleep timers

// OLD WAY (pray-driven development)
await page.waitForTimeout(3000);

// PASSMARK WAY
waitUntil: "Cart shows 1 item"
Enter fullscreen mode Exit fullscreen mode

Zero flaky tests from timing issues. The AI waits until it confirms the condition with exponential backoff.

4. effort: "high" handles conditional logic

If there's a step about transportation, select any option.
Continue through all available quiz steps.
Enter fullscreen mode Exit fullscreen mode

Standard mode (Gemini Flash) handles linear flows. High effort (Gemini Pro) handles branching and conditional logic.

5. The entire suite has ZERO maintenance burden

UI redesign? Passmark adapts. New button text? Passmark reads the page. Class name changed? Passmark doesn't care.


Full Test Stats

Metric Count
Apps tested 5
Test files 7
Total tests 30
Assertions 80+
CSS selectors used 0
XPath queries 0
data-testid attributes 0
waitForTimeout calls 0

Try It

git clone https://github.com/x-tahosin/breaking-apps-passmark.git
cd breaking-apps-passmark
npm install && npx playwright install chromium

echo "OPENROUTER_API_KEY=your-key" > .env
npm test
Enter fullscreen mode Exit fullscreen mode

GitHub: x-tahosin/breaking-apps-passmark
Passmark: bug0inc/passmark


The future of E2E testing isn't writing better selectors — it's not writing selectors at all.

What's the most flaky test you've ever had to fix? I'd love to hear your horror stories in the comments.

Top comments (0)