Your E2E tests passed. CI is green. Deploy went through.
And now checkout is broken because a new banner covers the "Pay" button on mobile.
The Problem
Traditional testing catches functional breaks. Button doesn't click? Test fails.
But what about:
- New hero section pushes products below the fold
- Chat widget overlaps "Add to Cart" on tablet
- CMS update breaks grid layout
- A/B test variant hides critical CTA
- Third-party script (analytics, ads) covers checkout form
- Responsive breakpoint works on desktop, broken on mobile
These aren't bugs. Tests pass. The page just doesn't work anymore.
Why Existing Tools Fail
Screenshot diff: Every pixel change = alert. Designer tweaks padding? 500 false positives. Team ignores alerts. Real issues slip through.
E2E tests: Check if button exists and clicks. Don't check if button is visible, accessible, not covered by promo banner.
Manual QA: Doesn't scale. Misses edge cases. "Works on my machine."
Semantic State Monitoring
Instead of comparing pixels or running click tests, compare what an LLM understands about your page:
Deploy 1: "E-commerce PDP. Product image, price, 'Add to Cart' button prominent. Checkout accessible."
Deploy 2: "E-commerce PDP. Product image, price, 'Add to Cart' button prominent. Checkout accessible."
Deploy 3: "E-commerce PDP. Large promo banner. 'Add to Cart' partially hidden. Checkout requires scroll."
↑
REGRESSION: Primary action degraded
The LLM doesn't check pixels. It checks whether the page still does its job.
Handling LLM Non-Determinism
LLMs aren't deterministic. Same page, slightly different wording. "12 products" vs "showing 12 items."
Solution: Moving window context.
Instead of comparing current vs previous, feed the LLM recent history:
const stateWindow = [];
const WINDOW_SIZE = 4;
async function checkRegression(currentState) {
stateWindow.push(currentState);
if (stateWindow.length > WINDOW_SIZE) stateWindow.shift();
if (stateWindow.length < 2) return null;
return await llm.chat({
prompt: `You're monitoring a web page for UX regressions.
Recent semantic snapshots (oldest to newest):
${stateWindow.map((s, i) => `[${i + 1}]: ${s}`).join('\n')}
Questions:
1. Is the latest snapshot a regression from established baseline?
2. Are primary actions (CTAs, forms, checkout) still accessible and prominent?
3. Is any critical UI element hidden, pushed off-screen, or covered?
Reply: {regression: true/false, severity: "critical/warning/none", reason: "..."}`
});
}
Now the LLM sees the pattern. Minor wording variations dissolve. Real regressions stand out.
Why Salience Changes Everything
Most "AI monitoring" solutions do this:
Page → LLM → "figure it out"
We do this:
Page → SiFR (structure + relations + salience) → LLM (interpretation, not discovery)
The model does not inspect the DOM equally. SiFR assigns salience scores to elements before the LLM sees them. High-salience elements (CTAs, forms, primary content) dominate the semantic state. Low-salience elements (footers, decorations, cookie banners) are effectively ignored.
This is why CSS tweaks don't trigger alerts, but "button covered by banner" does.
| Element | Salience | LLM Treatment |
|---|---|---|
| Checkout button | 95% | Critical — visibility change = regression |
| Product grid | 88% | Important — pushed off-screen = warning |
| Promo banner | 70% | Monitor — if it occludes high-salience = alert |
| Footer links | 15% | Ignored |
| Cookie consent | 12% | Ignored |
We don't ask the model what matters — we tell it.
What This Catches
| Issue | E2E Tests | Visual Diff | Semantic |
|---|---|---|---|
| Button covered by new banner | ❌ Pass | ⚠️ Alert (among 50 others) | ✅ "CTA occluded" |
| Products pushed below fold | ❌ Pass | ❌ Pass | ✅ "Primary content degraded" |
| Mobile layout broken | ❌ Pass (if desktop-only) | ⚠️ Noise | ✅ "Responsive regression" |
| Third-party widget overlap | ❌ Pass | ⚠️ Noise | ✅ "External element occludes checkout" |
| CMS broke grid | ❌ Pass | ⚠️ Alert flood | ✅ "Layout structure changed" |
| A/B test hides CTA | ❌ Pass | ❌ Different baseline | ✅ "Variant missing primary action" |
Bonus: Security Layer
Same approach catches malicious changes:
- Defacement: High-salience content replaced → instant alert
- Phishing overlay: New high-salience form over login → "Anomaly: duplicate auth form"
- Content injection: Suspicious iframe/script in critical area → flagged
Because the LLM reads a projection of the page (pre-weighted by salience), not raw HTML:
- Injected instructions in low-salience areas = ignored
- Prompt injection surface = minimal
Security monitoring as a free addon to your QA pipeline.
Implementation
// Playwright + Element-to-LLM
async function getSemanticState(page, viewport) {
await page.setViewportSize(viewport); // Test multiple breakpoints
const sifr = await page.evaluate(() => {
return new Promise((resolve) => {
document.addEventListener('e2llm-capture-response', (e) => {
resolve(e.detail.data);
}, { once: true });
document.dispatchEvent(new CustomEvent('e2llm-capture-request', {
detail: { selector: 'body', options: { preset: 'minimal' } }
}));
});
});
return await llm.chat({
prompt: `Describe this page's functional state:
1. Primary actions available (buttons, forms, CTAs)
2. Content hierarchy (what's prominent vs hidden)
3. Any UI issues (overlaps, off-screen elements, broken layout)
Be consistent. Same functional state = same description.`,
context: JSON.stringify(sifr)
});
}
// Check critical viewports
const viewports = [
{ width: 1920, height: 1080, name: 'desktop' },
{ width: 768, height: 1024, name: 'tablet' },
{ width: 375, height: 667, name: 'mobile' }
];
for (const vp of viewports) {
const state = await getSemanticState(page, vp);
const result = await checkRegression(state);
if (result?.regression) {
alert(`[${vp.name}] ${result.severity}: ${result.reason}`);
}
}
When To Run
| Trigger | Use Case |
|---|---|
| Post-deploy | Catch regressions before users |
| Scheduled (hourly) | Third-party script changes, CMS updates |
| Pre-merge (staging) | PR review with semantic diff |
| Multi-viewport | Responsive regression detection |
Try It
- Install Element-to-LLM extension
- Integrate with Playwright
- Add your LLM
- Run on critical pages post-deploy
Your tests check if code works. This checks if users can use it.
Series Index:
- #1 — What Can an LLM Actually See?
- #2 — Capturing What Matters
- #3 — The Devil's in the Diffs
- #4 — Building a Robust Selection Model
- #5 — Taming the Token Budget
- #6 — Representing Relations Without the Hierarchy
- #7 — Shipping: Extension, Automation, and What's Next
- #8 — From Bug Reports to Automated Regression: A QA Pipeline
- #9 — Semantic Regression Detection (you are here)
Running this in your pipeline? Share your experience in the comments.
Tags: #webdev #frontend #testing #qa #devops
Top comments (0)