I'm a CFO who builds multi-agent AI systems for finance. This post documents the architecture decisions behind CHP (Consensus Hardening Protocol) — an open-source decision-governance layer I built to prevent false consensus in multi-agent LLM systems.
Repo: https://codeberg.org/cubiczan/consensus-hardening-protocol
The Problem
Multi-agent systems have a dirty secret: LLM agents don't debate. They agree.
Put three instances of the same model in a deliberation loop. They converge in 1-2 rounds. Cosine similarity >0.95. The "consensus" is an artifact of shared training, not independent reasoning.
Even with different prompts, roles, and instructions, same-model agents produce outputs that are nearly identical in structure, conclusion, and confidence. The deliberation is theatrical.
Why I Cared
I deploy multi-agent systems for:
- Commodity intelligence across lithium, nickel, and cobalt markets
- CFO variance analysis
- SEC-grade financial research
- Compliance scanning
In these domains, a false consensus is a liability. Literally.
Architecture: State Machine vs. Probabilistic
First decision: deterministic state machine vs. probabilistic convergence scoring.
I chose the state machine.
Reason: enterprise compliance teams need inspectable audit trails. They need to see that Agent A committed at timestamp T1 with reasoning R1, that Agent B (adversarial) challenged with counter-argument C1, and that the consensus was accepted because the R0 gate score exceeded threshold.
Probabilistic frameworks give you a confidence distribution. State machines give you a decision log. Compliance teams audit logs, not distributions.
EXPLORING → ADVISORY_LOCK → PROVISIONAL_LOCK → LOCKED
Foundation Disclosure
Agents commit to their reasoning BEFORE cross-agent communication.
Why: anchoring bias. If Agent A shares first, Agents B and C defer. Information cascading turns 3 agents into 1 agent with 3 voices.
Implementation: each agent produces a sealed payload (reasoning chain + conclusion + confidence) that's encrypted until all agents have committed. Only then are payloads revealed simultaneously.
Adversarial Layer
Not a soft prompt. A hard constraint.
The adversarial agent has ONE job: produce a logically valid counter-argument with cited evidence. If it can't, the original conclusion stands. But the attempt is logged — "adversary could not produce a valid challenge" is itself a signal of high-confidence consensus.
This is structurally different from "temperature: 1.2" or "you are a devil's advocate." Those are prompt-level suggestions that the model can ignore. CHP's adversarial role is an architectural constraint: no valid counter-argument = no state transition to PROVISIONAL_LOCK.
R0 Gate
The convergence detector.
If inter-agent similarity exceeds threshold T before the adversarial round completes, the system flags the consensus as potentially sycophantic. Deliberation resets with new initialization seeds.
Calibration: T is set empirically per domain. In finance (where ground truth is verifiable against GL data), I calibrate against known-correct and known-incorrect outcomes. In open-ended domains (strategy, research), T is set conservatively high.
This is the area where I most want community feedback.
Heterogeneous Models
The simplest anti-sycophancy mitigation: don't use the same model.
My specialist clusters run GPT-4o + Claude + DeepSeek. Different training data, different RLHF, different failure modes. Natural disagreement is higher. Genuine consensus (when it occurs) is more trustworthy because it emerged from heterogeneous reasoning, not shared training artifacts.
Token economics: MoE Router dispatches to specialist clusters using nano models at $0.02-0.20/M tokens. GroupDebate subgroup partitioning cuts costs 51.7% while preserving accuracy.
What I'd Do Differently
- The R0 gate calibration is manual. I'd like a meta-learning layer that adjusts T based on historical decision accuracy.
- The adversarial role prompting needs more research. Current implementation uses role-based prompting with explicit logical proof requirements. But the quality of adversarial arguments varies significantly across base models.
- Cross-model payload envelope format needs standardization. I'm using a custom JSON schema. An industry standard would make CHP interoperable across platforms.
Full Portfolio
48 repos spanning finance AI, commodity intelligence, compliance automation, blockchain traceability, and swarm trading: https://codeberg.org/cubiczan
PRs welcome. Especially on R0 calibration and adversarial prompting.
Top comments (0)