S M Tahosin

Posted on Apr 20 • Originally published at tahosin.hashnode.dev

I Wrote 30 E2E Tests for 5 Production Apps Using Only English — No Selectors, No XPath, No Flaky Tests

#testing #opensource #ai #webdev

What if I told you I tested 5 real production web apps — an AI platform, an e-commerce store, a scheduling tool, an API builder, and a carbon footprint analyzer — with 30 regression tests and 80+ assertions...

And I didn't write a single CSS selector?

No document.querySelector. No .btn-primary. No [data-testid="submit"]. Nothing.

Just plain English.

The Tool: Passmark

Passmark is an open-source Playwright library by Bug0. You describe tests in natural language. AI (Gemini + Claude) executes them.

Here's a real test from my suite:

test("Add product to cart with variant selection", async ({ page }) => {
  await runSteps({
    page,
    userFlow: "Add a product with specific variants to cart",
    steps: [
      { description: "\"Navigate to https://demo.vercel.store\" },"
      { description: "\"Click on the first product\" },"
      { description: "\"Select color\", data: { value: \"White\" } },"
      { description: "\"Select size\", data: { value: \"M\" } },"
      { description: "\"Click 'Add to Cart'\","
        waitUntil: "Cart shows 1 item" },
    ],
    assertions: [
      { assertion: "Cart contains the product just added" },
      { assertion: "Cart shows correct variant selected" },
      { assertion: "A total price is displayed" },
    ],
    test, expect,
  });
});

No selectors. No brittle locators. Just intent.

When the UI changes, the test self-heals. When assertions run, Claude and Gemini both verify independently — if they disagree, a third model arbitrates.

The 5 Apps I Tested

I chose 5 completely different apps to stress-test Passmark across domains:

1. HOCKS AI (My AI Platform)

URL: hocks-ai.web.app | 9 tests

Test	What it verifies
Landing page loads	Branding, CTA buttons, modern design
Sign-up form fields	Email, password, submit button present
Empty form validation	Error messages on empty submit
Invalid email rejection	Proper validation for bad emails
Google OAuth UI	Sign in with Google button exists
Wrong credentials	Graceful error, stays on login page
Chat interface	Message input, responsive layout
Navigation	Section switching works, no 404s

2. Vercel Commerce (E-Commerce)

URL: demo.vercel.store | 7 tests

The hardest tests to write traditionally — cart state, variant selection, search. With Passmark:

test("Cart persists after adding multiple items", async ({ page }) => {
  await runUserFlow({
    page,
    userFlow: "Add multiple products to cart",
    steps: `
      Navigate to the store.
      Click on the first product.
      Select any color and size. Add to cart.
      Go back to homepage.
      Click a different product.
      Select variants. Add to cart.
      Open the cart.
      Verify 2 items in cart.
    `,
    effort: "high",
  });

  await assert({
    page,
    assertion: "Cart contains at least 2 items with a total price",
    expect,
  });
});

The effort: "high" flag uses Gemini Pro for complex multi-step flows. Game changer.

3. Cal.com (Scheduling)

URL: cal.com | 4 tests

Tested date selection, timezone handling, booking form validation, and time slot display. The AI navigated Cal.com's dynamic calendar UI flawlessly.

4. Hoppscotch (API Platform)

URL: hoppscotch.io | 5 tests

test("Send GET request and view response", async ({ page }) => {
  await runSteps({
    page,
    userFlow: "Send a GET request to a public API",
    steps: [
      { description: "Navigate to https://hoppscotch.io" },
      { description: "Clear URL field and type the endpoint",
        data: { value: "https://jsonplaceholder.typicode.com/posts/1" } },
      { description: "Click Send",
        waitUntil: "Response appears" },
    ],
    assertions: [
      { assertion: "JSON response body is displayed" },
      { assertion: "Response contains userId, id, title fields" },
      { assertion: "Status shows 200" },
    ],
    test, expect,
  });
});

Tested method switching (GET → POST), header management, response viewing, and collections sidebar.

5. EcoSense AI (Green Tech)

URL: ecosense-ai.pages.dev | 5 tests

Multi-step quiz flow, step validation, accessibility checks, and mobile responsive testing at custom viewports.

5 Things That Blew My Mind

1. It works across completely different UIs

Firebase app, Next.js app, Vue app — same natural language patterns worked everywhere. "Click the Sign Up button" just works regardless of the framework.

2. Multi-model assertions catch real bugs

Claude passed an assertion. Gemini failed it. Gemini was right — the product variant wasn't actually visible. Single-model testing would have shipped a false positive.

3. Smart waits > sleep timers

// OLD WAY (pray-driven development)
await page.waitForTimeout(3000);

// PASSMARK WAY
waitUntil: "Cart shows 1 item"

Zero flaky tests from timing issues. The AI waits until it confirms the condition with exponential backoff.

4. `effort: "high"` handles conditional logic

If there's a step about transportation, select any option.
Continue through all available quiz steps.

Standard mode (Gemini Flash) handles linear flows. High effort (Gemini Pro) handles branching and conditional logic.

5. The entire suite has ZERO maintenance burden

UI redesign? Passmark adapts. New button text? Passmark reads the page. Class name changed? Passmark doesn't care.

Full Test Stats

Metric	Count
Apps tested	5
Test files	7
Total tests	30
Assertions	80+
CSS selectors used	0
XPath queries	0
`data-testid` attributes	0
`waitForTimeout` calls	0

Try It

git clone https://github.com/x-tahosin/breaking-apps-passmark.git
cd breaking-apps-passmark
npm install && npx playwright install chromium

echo "OPENROUTER_API_KEY=your-key" > .env
npm test

GitHub: x-tahosin/breaking-apps-passmark
Passmark: bug0inc/passmark

The future of E2E testing isn't writing better selectors — it's not writing selectors at all.

What's the most flaky test you've ever had to fix? I'd love to hear your horror stories in the comments.

DEV Community

I Wrote 30 E2E Tests for 5 Production Apps Using Only English — No Selectors, No XPath, No Flaky Tests

The Tool: Passmark

The 5 Apps I Tested

1. HOCKS AI (My AI Platform)

2. Vercel Commerce (E-Commerce)

3. Cal.com (Scheduling)

4. Hoppscotch (API Platform)

5. EcoSense AI (Green Tech)

5 Things That Blew My Mind

1. It works across completely different UIs

2. Multi-model assertions catch real bugs

3. Smart waits > sleep timers

4. `effort: "high"` handles conditional logic

5. The entire suite has ZERO maintenance burden

Full Test Stats

Try It

Top comments (0)

The Tool: Passmark

The 5 Apps I Tested

1. HOCKS AI (My AI Platform)

2. Vercel Commerce (E-Commerce)

3. Cal.com (Scheduling)

4. Hoppscotch (API Platform)

5. EcoSense AI (Green Tech)

5 Things That Blew My Mind

1. It works across completely different UIs

2. Multi-model assertions catch real bugs

3. Smart waits > sleep timers

4. effort: "high" handles conditional logic

5. The entire suite has ZERO maintenance burden

Full Test Stats

Try It

4. `effort: "high"` handles conditional logic