How I Stopped Writing Fragile E2E Tests and Let AI Handle It

#testing #mcp #webdev #aitesting

Last month I spent 4 hours debugging a Playwright test that broke because someone renamed a CSS class. Sound familiar?

I decided to try a different approach: what if the test framework could see the app like a human does, instead of relying on brittle selectors?

What I Built

An MCP (Model Context Protocol) server that gives AI agents — Claude, GPT, Cursor, Copilot — direct access to running applications. The AI can:

Launch and connect to apps via CDP
Tap elements, fill forms, scroll, navigate
Take screenshots and analyze UI snapshots
Run assertions in natural language

The Key Trick: Semantic Snapshots

Instead of sending full screenshots (expensive in tokens), I built a snapshot system that extracts the UI's semantic structure — interactive elements, their positions, labels, states. The AI gets a complete picture of the UI in ~2ms and a few hundred tokens.

Compare that to a screenshot: ~100KB of base64, thousands of tokens, and the AI still has to "guess" where buttons are.

Real Numbers

Metric	Traditional	AI-Driven
Tap latency	50-200ms	1ms
UI analysis	500ms-2s (screenshot)	2ms (snapshot)
Test brittleness	High (selector-dependent)	Low (semantic)
Platforms supported	Usually 1-2	10