DEV Community

Murat K Ozcan
Murat K Ozcan

Posted on

The testing meta most teams have not caught up to yet

A QA operating system for AI testing

This post is for anyone shipping Playwright tests and watching AI output rot in review. I built this to reduce review bandwidth and stop AI from guessing - it is a QA operating system built from real test architecture, not prompt luck.

Quick links:

TL;DR: TEA (Test Architect) encodes test strategy and release gates, Playwright-Utils fixes Playwright DX, and Playwright MCPs give AI live verification. Together, they turn AI testing from promptware into repeatable engineering by standardizing patterns and verifying against the real system.

If your AI tests look productive but rot in review, you do not have a test suite, you have a slop factory.
By slop I mean redundant coverage, wrong assertions, nondeterministic flows, and unreviewable diffs.

I stopped trusting AI test generation the day it could not explain my own quality standards, so I embedded them.

Fixing DX first: Playwright-Utils

The first thing I addressed was Playwright DX. Playwright-Utils bridges the gap between Cypress-like ergonomics and Playwright's power with a functional core and fixture shell design. Many capabilities work as a pure function or as a fixture, for UI as well as API/backend flows. It standardizes the primitives teams drift on: auth, API, retries, logging, files, etc.

The brain: TEA (Test Architect agent from BMAD)

BMAD TEA (Test Architect) is a quality operating model as an agent: risk-based test design, fixture architecture, CI/CD gates, and release readiness packaged as workflows. I'm a certified Siemens test architect. TEA is how I made my quality standards executable - workflows instead of tribal memory.

TEA spans eight workflows: *framework, *ci, *test-design, *atdd (acceptance test driven design), *automate, *test-review, *trace, and *nfr-assess - everything from strategy to gates.

The hands and eyes: Playwright MCPs

Playwright MCPs arrived and turned diagnosis, execution, and fixing into a live loop. MCPs close the loop: the agent can run the flow, confirm the DOM against the accessibility tree / the network responses against the API spec, and use Playwright's Trace Viewer/UI mode as the verification canvas. For backend behavior, API flows are validated via api-request plus recurse, with an optional UI mode. Later, Playwright shipped its agent prompt packs, and I folded that guidance into TEA as an optional accelerator.

How it works in practice

  1. TEA produces the test strategy and gate criteria for the feature.
  2. TEA generates tests using Playwright-Utils patterns (auth, retries, API, logging, etc.).
  3. MCPs verify UI flows / API behavior against the live system.
  4. Failures feed back into TEA to repair, using the same standards.
  5. Output: PR-ready tests plus traceability and gate artifacts (risk table, requirements-to-tests traceability matrix, release gate checklist).

Before example: 20 AI tests -> 12 flaky, 5 redundant, 3 wrong assertions
After: risk plan + gates -> P0/P1 only -> selectors/API calls verified -> behavior validated -> fewer tests, higher signal

Why this stack matters

Playwright is intentionally unopinionated, which means no two projects look the same and drift happens fast, even inside the same org. It optimizes for stability and speed over standardized DX, so teams reinvent the same primitives. Every team reinvents auth, API calls, retries, and logging. Then the AI arrives at a mess, and a mess produces slop.

Meanwhile, prompt-driven testing paths like cy.prompt lean into nondeterminism, which is the exact opposite of what testing exists to protect. The fix is not a better prompt; it is a better system: TEA to encode the standards, Playwright-Utils to enforce the patterns, and MCPs to verify UI flows against the live system, with api-request plus recurse covering backend behavior.

What you get out of the box

Playwright-Utils provides nine utilities. Six are backend or frontend agnostic: api-request (with schema validation), recurse, log, file-utils, burn-in, and auth-session. Three are UI-focused: network-error-monitor, network-recorder, and intercept-network-call.

TEA uses those patterns by default when you enable the integration in the BMAD installer, so the AI does not have to relearn the same documentation every time. That is context engineering in practice.

Call to action

This stack is already open source and being used in production at SEON technologies. If you want to level up your craft, use it. If you want to push it further, contribute. If you want to build with people who care about quality at this level, reach out.

Top comments (0)