The Validation Server: Test AI Claims Against Reality Before Your Users Do

#ai #llm #mcp #testing

The Validation Server: Test AI Claims Against Reality Before Your Users Do

There's a hard lesson in deploying AI agents in production: confidence and accuracy are completely uncorrelated.

An LLM can tell you something with absolute certainty and be completely wrong. It will cite a commit that doesn't exist. It will claim an API is up when it's been down for hours. It will give you a price that changed last week. This isn't a bug you fix by prompting better. It's a structural property of how these models generate text — they produce plausible output, not verified facts.

The fix we built is a Validation Server: a FastMCP server that tests challenged claims against reality before they can cause damage.

The Insight

Consensus catches disagreements. Multiple agents reviewing a finding can spot logical gaps, conflicting claims, and missing context. But consensus doesn't catch confabulation — the case where every agent is confidently wrong.

Example: a research agent reports that a GitHub commit a3f9b2c added user authentication on March 15. The Auditor reviews it and says "looks plausible." The Scout confirms the repo exists. Consensus score: 0.8 — confirmed.

But the commit doesn't exist. The date is wrong. The feature isn't in that commit. Every agent was confident and every agent was wrong.

You need a reality check. That's what the Validation Server does.

What It Tests

The Validation Server has scenarios for different types of claims:

http_endpoint  → curl the URL, check status code
network_reachability  → TCP connect to host:port
api_json  → fetch JSON API, check field value
price_check  → search web for corroboration
git_claim  → search git log for commit
web_claim  → Brave search to verify facts
shell_command  → run arbitrary command

Each scenario has a defined validation protocol. The Validation Server doesn't use the same LLM to check itself — it uses actual external systems.

How It Works

When the Consensus Server flags a finding as challenged (score 0.3–0.6), it calls:

validate_challenged_findings(consensus_round_id)

This loops through each challenged finding, picks the right scenario type, runs the test, and submits the result back to the Consensus Server:

{
  finding_id: "x402-pricing",
  scenario: "price_check",
  result: {
    claimed: "$0.03 per request",
    actual: "price not found on x402 website",
    corroborated: false,
    severity: "high"  # pricing claim without evidence
  },
  validated: true,
  passed: false
}

If the finding fails validation, it's either rejected or flagged for human review. No confident lie gets to become a product recommendation.

The Full Flow

Research Agent makes claim
        ↓
Consensus Server: Scout + Auditor + Dev vote
        ↓
Score ≥ 0.6 → confirmed (goes to synthesis)
Score 0.3–0.6 → challenged → Validation Server
Score < 0.3 → rejected
        ↓
Validation Server tests reality
        ↓
Passes → confirmed (back to consensus)
Fails → rejected or human review
        ↓
Synthesis (Hemingway) → final report with confidence levels

Real Example: x402 Endpoint Research

We ran this on the x402 ecosystem. The gap dig reported:

"14 endpoints deployed across the x402 marketplace"
"wallet 0xf404... has received no transactions"
"Pricing: $0.001–$0.05 per request depending on endpoint"

Consensus scores: all above 0.6. But the validation phase caught:

The wallet actually had received one small transaction (from a test we forgot about)
One endpoint was returning 403, not 200
The pricing for meeting-notes-summary was actually $0.001 in the deployed code, not $0.03 as claimed

All three were small errors. But in a product decision context — "should I build on x402 or use traditional APIs" — small pricing errors compound.

Implementation

The Validation Server is a FastMCP server at agents/servers/validation_server.py, registered as validation-server in openClaw.json:

quick_validate(claim, scenario_type, context)
  # One-shot validation, no consensus loop needed

define_validation(find_id, claim, scenario, ctx)
  # Define a validation to run later

run_and_submit_to_consensus(val_id, round_id)
  # Full loop: run validation, submit result to consensus

Why This Is Structural, Not Promptable

You might think: "why not just prompt the agent to be more careful?" The answer is that confabulation is not a confidence problem — it's a knowledge problem. The model genuinely doesn't know the price changed, the commit doesn't exist, the API is down. Telling it to be more confident doesn't fix that. Telling it to check reality does.

The Validation Server is how you operationalize "check reality before stating it as fact."

Source: agents/servers/validation_server.py

Get free AI automation guides and weekly tips: mrclaws-ai-automation-for-small-business.kit.com/b0fcff2c50