The Consensus Server Pattern: How to Catch AI Confabulation Before It Reaches Your Users

#ai #architecture #llm #testing

LLMs are great at sounding confident. That's the problem.

An LLM will tell you that the commit a3f9b2c added user authentication last Tuesday, that the /api/v2/users endpoint returns a 200 OK, and that the price of a Pro subscription is $19/month — all with complete certainty, all potentially wrong. This isn't a bug. It's a feature of how these models work: they generate plausible text, not verified facts.

We call this confabulation — the model filling gaps with confident-sounding nonsense. And in production AI systems, it can damage trust, break integrations, or send your users down blind alleys.

The classic answer is "add validation." But validation against what, exactly? You can't hand every finding to a human. And a single additional LLM call just trades one model for another — same confabulation risk, doubled latency.

We built something different: a Consensus Server where multiple agents vote on each finding before it goes anywhere.

The Core Idea

Instead of one agent making a claim, run three agents with distinct roles:

Scout — the researcher. Gathers facts, checks sources, builds the case.
Auditor — the skeptic. Challenges assumptions, looks for gaps, pokes holes.
Dev — the implementation checker. Verifies whether findings actually work in code.

Each agent independently evaluates a finding, then submits a vote. The votes are weighted by confidence and aggregated. If the consensus score clears a threshold, the finding is confirmed. If not, it's flagged for human review or re-research.

How the Voting Works

Every vote carries two values: a direction and a confidence.

Vote Type	Direction	Weight
Confirm	+1	× confidence (0.0–1.0)
Challenge	−1	× confidence (0.0–1.0)
Uncertain	0	0 (no influence)

The confidence score is the agent's self-reported certainty. An agent that's 90% sure it's right contributes 0.9 to the tally. One that's 60% sure contributes 0.6.

The consensus score is the weighted sum, normalized by the number of voting agents:

consensus_score = sum(weight_i × direction_i) / num_voting_agents

A score ≥ 0.6 means confirmed. Below 0.6 means challenged. The exact threshold is tunable — lower it for higher recall (catch more edge cases), raise it if you want fewer false positives.

The Critical Insight

Here's the part most people miss: an agent can be highly confident AND wrong.

A single model saying "I'm 95% sure this commit exists" sounds reassuring. But confidence is about the model's internal consistency, not about ground truth. When three agents with different prompts and roles look at the same claim, their disagreements surface the uncertainty that raw confidence scores hide.

This is why consensus beats validation:

"Consensus catches disagreements. Validation catches confabulation."

Validation asks "is this true?" Consensus asks "do multiple independent agents believe this?" The second question is answerable without a ground-truth oracle. The first one isn't.

A Real Example

Here's a concrete scenario: your agent claims that git log --oneline in the auth-service repo shows a commit e8f2a91 that implements OAuth2 login.

Before surfacing this to the user, you route it through the consensus server:

# agents/servers/consensus_server.py

from dataclasses import dataclass
from enum import Enum
from typing import Optional

class VoteType(Enum):
    CONFIRM = "confirm"
    CHALLENGE = "challenge"
    UNCERTAIN = "uncertain"

@dataclass
class Vote:
    agent: str
    vote_type: VoteType
    confidence: float  # 0.0 to 1.0
    reason: str

@dataclass
class Finding:
    id: str
    claim: str
    context: dict
    votes: list[Vote]
    consensus_score: Optional[float] = None
    status: Optional[str] = None  # "confirmed" | "challenged"

def submit_vote(finding_id: str, vote: Vote) -> Finding:
    """
    Submit a vote from an agent for a specific finding.
    Recalculates consensus score after applying the vote.
    """
    finding = get_finding(finding_id)
    finding.votes.append(vote)
    finding.consensus_score = calculate_score(finding.votes)
    finding.status = (
        "confirmed" if finding.consensus_score >= 0.6
        else "challenged"
    )
    save_finding(finding)
    return finding

def get_consensus_results(finding_id: str) -> Finding:
    """Return the current state of a finding after all votes."""
    return get_finding(finding_id)

def get_challenged_findings() -> list[Finding]:
    """Return all findings with status 'challenged' for review."""
    return [f for f in get_all_findings() if f.status == "challenged"]

Each agent calls submit_vote() independently:

# Scout confirms (confident)
submit_vote(finding_id, Vote(
    agent="scout",
    vote_type=VoteType.CONFIRM,
    confidence=0.85,
    reason="Found commit e8f2a91 in git log with OAuth2 message"
))

# Auditor challenges (medium confidence — GitHub may be stale)
submit_vote(finding_id, Vote(
    agent="auditor",
    vote_type=VoteType.CHALLENGE,
    confidence=0.65,
    reason="GitHub commit list was 3 days stale; local git disagreed"
))

# Dev confirms (high confidence — ran the command)
submit_vote(finding_id, Vote(
    agent="dev",
    vote_type=VoteType.CONFIRM,
    confidence=0.95,
    reason="Executed git log locally; commit exists and touches auth files"
))

Score: (0.85 + (-0.65) + 0.95) / 3 = 0.38 — challenged. The finding doesn't go to the user until someone resolves why the Auditor found a discrepancy.

The MCP Server Interface

The consensus server registers as an MCP tool server — consensus-server — in OpenClaw. That means any agent can call it through the standard MCP tool interface without you wiring up custom HTTP endpoints or message queues.

# agents/servers/consensus_server.py (MCP registration)

TOOLS = [
    "submit_vote",
    "get_consensus_results",
    "get_challenged_findings",
]

async def serve(self, tool_name: str, arguments: dict):
    match tool_name:
        case "submit_vote":
            return submit_vote(**arguments)
        case "get_consensus_results":
            return get_consensus_results(**arguments)
        case "get_challenged_findings":
            return get_challenged_findings()

Once registered, your Scout, Auditor, and Dev agents call it like any other tool — the friction of adding a new verification step is near zero.

When to Use It

Consensus is most valuable when:

The cost of being wrong is high — database writes, external API calls, financial data
Facts are time-sensitive — prices, API statuses, availability windows
The domain invites confident fabrication — git history, large codebases, vague product specs

It's overkill for "what's the weather in Toronto" or "translate this paragraph." Save it for the findings that travel downstream to humans or critical systems.

Wrapping Up

Confabulation isn't going away. The models will keep generating confident lies. But you can catch most of them before they hit your users — not with a single validator, but with a system of distributed skepticism.

Three agents. Three votes. One threshold. That's the Consensus Server pattern.

Source code: agents/servers/consensus_server.py — registered as consensus-server MCP tool server in OpenClaw.

Get free AI automation guides and weekly tips: mrclaws-ai-automation-for-small-business.kit.com/b0fcff2c50