Shiplight

Posted on Apr 10 • Originally published at shiplight.ai

Agent-First Testing: Build Quality Into Every AI Coding Session

#testing #ai #cursor #claudecode

Agent-first testing embeds automated verification directly into the AI coding agent's workflow — not added afterward. The agent writes code, opens a real browser, verifies the change works, and saves the verification as a test. All in one loop, without leaving the development session.

This is a direct response to a structural problem in agent-first development: AI coding agents ship code faster than traditional QA cycles can absorb. When an agent can implement a feature in minutes, a testing workflow that requires hours of separate work is no longer compatible with the development velocity.

Why Traditional QA Breaks in Agent-First Teams

Traditional QA assumes a handoff. A developer finishes a feature, opens a PR, a reviewer checks the diff, QA runs tests. The gap between "code written" and "code verified" is measured in hours or days.

AI coding agents collapse the "code written" side to minutes. The handoff gap doesn't shrink — it becomes the dominant bottleneck. As agents write more code, human QA becomes the constraint on shipping velocity.

The human QA bottleneck in agent-first teams manifests in three ways:

Volume mismatch — agents generate 10–20x more code changes per day than traditional developers. Manual review can't keep pace.
Context loss — QA engineers reviewing agent-generated code don't have the session context the agent had. They miss the intent behind the change.
Verification gap — agents typically don't run the application after making changes. The code looks correct but hasn't been verified in a real browser.

Agent-first testing closes all three gaps by making the agent itself responsible for verification.

What Agent-First Testing Looks Like in Practice

In an agent-first testing workflow, the coding agent completes a full verification loop:

Implement the change — write code as normal
Launch a browser — navigate to the running application
Verify the UI — click through the affected flow
Assert outcomes — use VERIFY statements to confirm expected state
Save as a test — persist as a YAML file in the repo
Run in CI — every future PR triggers the same verification automatically

goal: Verify new checkout discount field after agent implementation
base_url: http://localhost:3000
statements:
  - navigate: /cart
  - intent: Add item to cart
    action: click
  - navigate: /checkout
  - intent: Enter discount code
    action: fill
    value: "SAVE20"
  - intent: Apply discount
    action: click
  - VERIFY: Order total shows 20% discount applied
  - VERIFY: Order confirmation page displays with order number

How MCP Enables Agent-First Testing

MCP (Model Context Protocol) is the technical foundation. With the Shiplight Plugin installed, an agent in Claude Code, Cursor, or Codex can:

Open a real browser and navigate to the running application
Interact with the UI — click, fill, submit, navigate
Run VERIFY assertions — AI-powered checks that confirm expected page state
Generate a test file — save the session as a .test.yaml in the repo
Run the test suite — execute existing tests against the current state

# Install in Claude Code
claude mcp add shiplight -- npx -y @shiplightai/mcp@latest

# Install in Cursor (add to .cursor/mcp.json)
{
  "mcpServers": {
    "shiplight": {
      "command": "npx",
      "args": ["-y", "@shiplightai/mcp@latest"]
    }
  }
}

The Agent-First Testing Stack

Layer 1: In-session verification (MCP)

The agent verifies changes in a real browser during development — before the PR is even opened.

Layer 2: PR-gating (CI smoke suite)

A fast smoke suite (under 5 minutes) runs on every PR against staging. Blocks merges when flows break.

Layer 3: Full regression (post-merge)

The complete test suite runs on merge to main. Catches regressions across the full product surface.

Layer 4: Self-healing maintenance

Tests use intent-based locators that self-heal when the UI changes — essential when agent-generated code changes the UI constantly.

Agent-First Testing vs Traditional QA

	Traditional QA	Agent-First Testing
When tests are written	After code ships	During development
Who writes tests	QA engineers	The coding agent
Verification timing	Hours to days after PR	Before PR is opened
Test format	Playwright/Selenium scripts	YAML (human-readable)
Maintenance	Manual selector updates	AI self-healing
Velocity impact	Slows release cadence	Scales with agent speed

Getting Started

Step 1: Install the Shiplight Plugin (free, no account required):

claude mcp add shiplight -- npx -y @shiplightai/mcp@latest

Step 2: On your next code change, ask the agent:

"Verify that the change you just made works correctly in a real browser and save it as a test."

Step 3: Review the generated .test.yaml file in the PR diff.

Step 4: Add the test to your CI smoke suite.

Step 5: Expand coverage incrementally — one test per meaningful feature change.

Originally published at shiplight.ai/blog/agent-first-testing

DEV Community