Most AI coding agents can generate a full page of UI in seconds. None of them can tell you whether the result is actually usable.
Missing alt texts. Broken tab order. Contrast ratios that fail WCAG. Focus traps that don't trap. These are the things that ship when accessibility is an afterthought and AI makes it worse because it ships faster.
I built VertaaUX to close that gap. It runs a deep UX and accessibility audit on any URL and returns scored, actionable findings across seven categories: usability, clarity, information architecture, accessibility, conversion, semantic markup, and keyboard navigation.
Today I want to show you two ways to use it: the CLI for your terminal workflow, and the MCP server for AI agents like Claude, Cursor, and Copilot.
The CLI: One Command, Full Audit
Install globally or run with npx:
npx @vertaaux/cli audit https://your-site.com
That's it. You get a scored report in your terminal with severity-ranked findings.
Here's a real run against vertaaux.ai itself:
$ vertaa audit https://vertaaux.ai --mode basic
[21:01:24] Running audit... (1/3) | 0 issues
[21:01:37] Audit Complete score=72 issues=36 (12s)
Scores
──────────────────────────────────────────────
Overall: 72/100
Category Score
──────────────────────────────────────────────
clarity 95
semantic 98
keyboard 82
usability 73
ia 66
conversion 63
accessibility 62
Seven categories, scored in 12 seconds. We eat our own dogfood — and yes, we have work to do on our own accessibility score.
Audit Modes
Three levels of depth depending on your needs:
# Fast broad check
vertaa audit https://your-site.com --mode basic
# Standard analysis (default)
vertaa audit https://your-site.com --mode standard
# Deep WCAG-focused audit
vertaa audit https://your-site.com --mode deep
JSON Output for CI/CD
The --format json flag gives you structured output for piping:
vertaa audit https://your-site.com --format json | jq '.data.scores'
{
"ia": 66,
"clarity": 95,
"keyboard": 82,
"semantic": 98,
"usability": 73,
"conversion": 63,
"accessibility": 62
}
Drilling Into Findings
Each issue comes with severity, business impact, and a recommended fix:
vertaa audit https://your-site.com --format json | jq '.data.issues[0]'
{
"title": "Too many navigation links",
"severity": "warning",
"description": "Found 17 links in navigation. Consider grouping or reducing to improve scanability.",
"businessImpact": "Cognitive overload reduces navigation efficiency by 40%",
"recommendedFix": "<!-- Group into dropdown menus -->...",
"estimatedEffort": "medium"
}
Not just "this is broken" — but why it matters and how to fix it.
GitHub Actions
Drop this into .github/workflows/a11y-gate.yml:
name: Accessibility Gate
on:
pull_request:
branches: [main]
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: vertaaux/audit-action@v1
with:
url: ${{ vars.STAGING_URL }}
api-key: ${{ secrets.VERTAAUX_API_KEY }}
fail-on: critical
PRs with critical accessibility violations don't merge. No manual review needed for the things machines can catch.
The MCP Server: Give Your AI Agent Eyes
The CLI covers your terminal and CI. But what about the AI agents you're already using to write code?
This is where MCP (Model Context Protocol) comes in. MCP lets AI agents call external tools. The VertaaUX MCP server exposes 38 tools that turn any MCP-compatible agent into a UX auditor.
Install
npm install -g @vertaaux/mcp-server
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"vertaaux": {
"command": "npx",
"args": ["-y", "@vertaaux/mcp-server"],
"env": {
"VERTAAUX_API_KEY": "vx_live_..."
}
}
}
}
Cursor / VS Code
Same config pattern — add the MCP server to your editor's MCP settings.
What Can the Agent Do?
Once connected, your AI agent can:
- Audit a URL — "Audit staging.myapp.com and tell me the top 5 issues"
- Explain findings — "Why does this contrast ratio fail? Show me the fix"
- Generate patches — "Create a PR that fixes the critical accessibility issues"
- Compare competitors — "Audit our site and competitor.com side by side"
- Track regressions — "Compare this audit to last week's baseline"
- Gate deployments — "Does this pass our vertaa.policy.yml?"
The agent doesn't hallucinate these capabilities. It calls real tools with real browser-based analysis.
Real Workflow: Audit → Fix → Verify
Here's what a typical session looks like in Claude Desktop:
You: "Audit staging.myapp.com for accessibility issues"
The agent calls audit_url, waits for the browser-based audit to complete, and returns scored findings.
You: "Fix the critical issues in the signup form"
The agent calls suggest_fix for each finding, generates framework-aware patches (it detects React/Vue/Svelte from your package.json), and shows you the diffs.
You: "Open a draft PR with those fixes"
The agent calls generate_pr, which applies the patches atomically via the Git Trees API. Draft PR — human review before merge.
You: "Verify the fixes landed on staging"
The agent calls verify_fixes against the baseline audit. Fixed issues, still-broken issues, and new regressions — all categorized.
One conversation. No context switching. No separate tooling.
CLI + MCP: Better Together
The CLI and MCP server share the same engine and scoring. Use them together:
| Scenario | Surface |
|---|---|
| Local dev — quick check before pushing | CLI |
| CI/CD — automated quality gate | CLI + GitHub Action |
| Code review — "is this accessible?" | MCP in your editor |
| Bug triage — investigate a reported issue | MCP in Claude Desktop |
| Competitor analysis — how do we compare? | MCP or CLI |
| Sprint planning — what's our UX debt? | CLI with --format json output |
Getting Started
- Get an API key at vertaaux.ai/settings/api
-
Run your first audit:
npx @vertaaux/cli audit https://your-site.com - Connect the MCP server to your AI agent of choice
- Set up a CI gate to catch regressions automatically
The free tier gives you enough audits to evaluate. Pro unlocks all seven scoring categories and advanced fix generation.
Ready to Take It to the Next Level? Agent Skills.
The CLI and MCP server give your agent the tools. But tools without context produce generic results. What if your agent knew which audit profile to pick, how to interpret the scores, which CI thresholds to set, and how to chain audits into fix plans — without you spelling it out every time?
That's what VertaaUX Agent Skills do. They're published in the open Agent Skills format and work with Claude Code, Cursor, Codex, GitHub Copilot, Gemini CLI, and any host that supports the format.
Install with one command:
npx skills add VertaaUX/agent-skills
Or install just the VertaaUX skill:
npx skills add VertaaUX/agent-skills --skill vertaaux
What the skill gives your agent:
-
Audit profile selection — built-in profiles like
quick-ux,wcag-aa, andci-gatewith a decision tree so the agent picks the right one for the task - Deterministic task recipes — step-by-step sequences for accessibility investigations, competitive reviews, CI setup, and remediation workflows
- Guardrails — prevents the agent from hallucinating CLI flags or API parameters that don't exist
-
Skill composition contracts — explicit handoff conventions so the
vertaauxskill chains cleanly intoa11y-review,create-analyzer, andarchitecture-review
The difference: without the skill, your agent runs audit_url and dumps raw findings. With the skill, it picks the right profile, runs the audit, triages by severity, generates a fix plan, and sets up a CI gate — in one conversation.
Browse the skill on Smithery: petri-lahdelma/vertaaux — or just run npx skills add VertaaUX/agent-skills and start auditing.
If you're shipping UI with AI assistance, you need something checking whether that UI actually works for everyone. Lighthouse covers performance. VertaaUX covers the rest.
Links:
Top comments (1)
Superpower is the right framing.
The onboarding model that seems to work best for agent tooling is capability-first, not connector-first. “One key, many superpowers” is easier for both the operator and the agent to reason about than a long list of named tools.
If one key lets the agent suddenly audit, extract, summarize, or generate useful output, you get to first value fast. Then you bring in the operator’s own systems only when the workflow actually needs them.
That is also why “38 tools” can be true without being the real product surface. The compounding value is usually one visible capability with clear structured output, not a tool graveyard the model has to rediscover every run.