A QA operating system for AI testing
This post is for anyone shipping Playwright or Pact tests and watching AI output rot in review. I built this to reduce review bandwidth and stop AI from guessing; it is a QA operating system built from real test architecture, not prompt luck.
Quick links:
- @seontechnologies/playwright-utils: Playwright DX and reusable utilities
- @seontechnologies/pactjs-utils: Contract testing DX and Pact.js utilities
- BMAD Method: agentic development framework
- TEA overview: BMAD Test Architect workflows
TL;DR: TEA (Test Architect) encodes test strategy and release gates, Playwright-Utils improves Playwright DX, Pact.js-Utils improves contract testing DX, Playwright MCPs give AI live UI/API verification, and the Pact MCP gives AI live broker access for contract design, review, and can-i-deploy safety checks. Together, they turn AI testing from promptware into repeatable engineering by standardizing patterns and verifying against the real system.
If your AI tests look productive but rot in review, you do not have a test suite, you have a slop factory.
By slop I mean redundant coverage, wrong assertions, nondeterministic flows, and unreviewable diffs.
I stopped trusting AI test generation the day it could not explain my own quality standards, so I embedded them.
Fixing DX first: Playwright-Utils
The first thing I addressed was Playwright DX. Playwright-Utils bridges the gap between Cypress-like ergonomics and Playwright's power with a functional core and fixture shell design. Many capabilities work as a pure function or as a fixture, for UI as well as API/backend flows. It standardizes the primitives teams drift on: auth, API, retries, logging, files, etc.
Fixing contract DX: Pact.js-Utils
The same drift problem exists in contract testing. Raw @pact-foundation/pact is powerful and deliberately low-level. So every team reinvents the same boilerplate: manual JsonMap casting for provider states, repeated inline builder lambdas for request/response bodies, 30+ line VerifierOptions assemblies, and auth injection middleware that silently double-prefixes Bearer Bearer ... tokens. Then the AI arrives at that mess and generates slop into it.
Pact.js-Utils does for contract testing what Playwright-Utils does for Playwright. It locks down the primitives teams drift on:
-
createProviderState— one-call tuple builder for.given(), no more manualJsonMapcasting -
setJsonContent/setJsonBody— reusable callbacks for PactV4 request/response builders, no more repeated inline lambdas -
buildVerifierOptions— single function assembles completeVerifierOptions, env-aware, local/remote verifier flow auto-detection (BDCT is handled separately via provider-contract publishing workflows) -
createRequestFilter— pluggable token generator that prevents double-Bearer bugs by contract -
zodToPactMatchers— derives Pact V3 matchers directly from a Zod schema, so the consumer type definition and the contract stay in sync without hand-written matcher helpers
The result is the same as with Playwright-Utils: AI does not have to reinvent the wheel or guess the project's conventions. TEA loads these patterns as knowledge fragments, so contract test generation follows them out of the box.
The brain: TEA (Test Architect agent from BMAD)
BMAD TEA (Test Architect) is a quality operating model as an agent: risk-based test design, fixture architecture, CI/CD gates, and release readiness packaged as workflows. I'm a certified Siemens test architect. TEA is how I made my quality standards executable - workflows instead of tribal memory.
TEA spans nine workflows: eight delivery workflows (*framework, *ci, *test-design, *atdd, *automate, *test-review, *trace, *nfr-assess) plus Teach Me Testing. Contract testing is a first-class concern: TEA knows when to recommend Pact over integration tests, how to structure consumer and provider pipelines, and when can-i-deploy is the right gate.
The hands and eyes: Playwright & SmartBear MCPs
Playwright MCPs arrived and turned diagnosis, execution, and fixing into a live loop. MCPs close the loop: the agent can run the flow, confirm the DOM against the accessibility tree, and use Playwright's Trace Viewer/UI mode as the verification canvas. API-spec validation comes from api-request's schema layer, not the MCP alone — the MCP provides the execution surface; schema enforcement is a separate concern. For backend behavior, API flows are validated via api-request plus recurse, with an optional UI mode. For contract testing, the Pact MCP connects to PactFlow to fetch existing provider states, generate test scaffolds, review contracts against best practices, and run can-i-deploy checks; all without leaving the agent session. The actual provider verification still runs through npm scripts and CI pipelines; the MCP is a design and review tool, not a test runner. Later, Playwright shipped its agent prompt packs, and I folded that guidance into TEA as an optional accelerator.
How it works in practice
- TEA produces the test strategy and gate criteria for the feature.
- TEA generates tests using Playwright-Utils patterns (auth, retries, API, logging, etc.) and Pact.js-Utils patterns (provider states, verifier options, request filters) for service contract coverage.
- MCPs verify UI flows / API behavior against the live system; Pact MCP validates contract interactions against PactFlow.
- Failures feed back into TEA to repair, using the same standards.
- Output: PR-ready tests plus traceability and gate artifacts (risk table, requirements-to-tests traceability matrix, release gate checklist).
Illustrative example: 20 AI tests -> 12 flaky, 5 redundant, 3 wrong assertions
With this stack: risk plan + gates -> P0/P1 only -> selectors/API calls verified -> behavior validated -> fewer tests, higher signal
The same ratio holds for contract tests. Before: AI generates interactions from consumer-side assumptions, wrong field names, wrong status codes, wrong matchers. After: TEA runs provider scrutiny first, Pact.js-Utils locks the pattern, Pact MCP validates against the live broker.
Why this stack matters
Playwright is intentionally unopinionated, which means no two projects look the same and drift happens fast, even inside the same org. It optimizes for stability and speed over standardized DX, so teams reinvent the same primitives. Every team reinvents auth, API calls, retries, and logging. Then the AI arrives at a mess, and a mess produces slop.
The same is true of @pact-foundation/pact. It is powerful and deliberately low-level. Without a utility layer, every team reinvents provider state builders, verifier configuration, and auth injection. AI lands in that drift and generates inconsistent contracts that fail at provider verification rather than at generation time.
Meanwhile, prompt-driven testing paths like cy.prompt from Cypress, lean into nondeterminism; the agent rewrites selectors on every run, which is the exact opposite of what testing exists to protect. (The generate-and-export mode is less risky because tests are committed and stable; the concern is the live-rewrite loop.) The fix is not a better prompt; it is a better system: TEA to encode the standards, Playwright-Utils to enforce E2E and API patterns, Pact.js-Utils to enforce contract patterns, and MCPs to verify flows against the live system.
What you get out of the box
Playwright-Utils provides nine core utilities. Six are backend or frontend agnostic: api-request (with schema validation), recurse, log, file-utils, burn-in, and auth-session. Three are UI-focused: network-error-monitor, network-recorder, and intercept-network-call. A webhook testing module is also available for async/event-driven flows.
Pact.js-Utils provides eleven utilities across four categories. Consumer helpers: createProviderState, toJsonMap, setJsonContent, setJsonBody. Provider verifier: buildVerifierOptions, buildMessageVerifierOptions, handlePactBrokerUrlAndSelectors, getProviderVersionTags. Request filter: createRequestFilter, noOpRequestFilter. Schema bridge: zodToPactMatchers.
TEA uses both libraries' patterns by default when you enable the integrations in the BMAD installer, so the AI does not have to relearn the same documentation every time. That is context engineering in practice.
Call to action
This stack is already open source and being used in production at SEON technologies. If you want to level up your craft, use it. If you want to push it further, contribute. If you want to build with people who care about quality at this level, reach out.
I am also using it in my open sourced side project couture-cast where I dogfood BMad, experiment and showcase everything.
Top comments (0)