Catching the Hallucination Before It Ships

#ai #machinelearning #security

A barrister files a brief and the citations turn out not to exist. A clinical chatbot recommends a dose that no formulary supports. A finance assistant reports a figure that traces back to nothing. Through 2025 and into 2026 the pattern has repeated across courtrooms, consulting rooms and trading desks, and it has hardened into the central obstacle to putting generative systems anywhere that the answer carries consequence. The failure is not occasional incompetence. It is structural. A language model is optimised to produce a fluent continuation, and a fluent continuation is indistinguishable, from the inside, whether it is true or invented.

That is the crux of the reliability gap. The model that produced the answer is also the only witness to its correctness, and it has no privileged access to the difference. Asking a single system to mark its own work returns a confidence score, not a verification. For regulated use, a confidence score is not enough. An institution that acts on an answer needs something it can stand behind afterwards, in front of an auditor, a regulator or a court.

One witness is not a verification

The Mickai Sovereign Intelligence Operating System, a British SIOS, treats this as a question of architecture rather than of better prompting. Two of the fifty-seven filed UK patent applications naming Micky Irons as inventor, recorded on the UK IPO register from GB2607309.8 onward, describe the mechanism. The first, filed as GB2611915.6, sets out cross-brain quorum for generative hallucination detection with divergence-triggered refusal. The principle is simple to state. The same prompt is dispatched to several independent generative brains running on operator-controlled silicon, and no artefact is signed unless those brains agree.

Agreement is measured, not assumed. A consensus aggregator computes a pairwise semantic-distance matrix under a metric chosen for the domain: BLEU and BERTScore for text, CLIP for imagery, AudioCLIP for audio, AST edit distance for code, structural similarity for video. A threshold evaluator then gates the signing against a per-domain semantic-distance threshold. Where the brains converge within that threshold, the answer proceeds. Where they diverge, nothing is signed.

The arrangement catches hallucination at generation time, before any output reaches a downstream consumer.

The refusal is the point. When the brains fall outside the threshold, the system does not paper over the disagreement with an averaged answer or the most confident voice. It declines to certify, and it logs an unsigned diagnostic record describing the divergence into the wider operator audit chain. A hallucination, in this framing, is most reliably detected as the absence of consensus among independent witnesses. A figure that one brain invents is a figure the others will not reproduce.

How the brains cooperate

This is not a single model wearing several hats. Mickai orchestrates twenty-six brains under a deterministic conductor, the Arbiter, which inspects each request, the operator's clearance and the active policy graph, then dispatches to the specialist brains in scope. For consequential work it convenes a quorum through the Quorum subsystem, which collects the brains' signed responses and adjudicates the outcome as unanimous, majority or conflicted. A conflict is not hidden. It surfaces to the operator as a signed disagreement record.

The specialists map directly onto the headline failure modes. ZEUS reads statutes and case law from on-device corpora and carries every claim back to its source through a citation graph, which is the discipline that fabricated authorities defeat. PHOENIX handles clinical reasoning and drug-interaction screening against the British National Formulary and NICE guidance. KARP produces analytical figures with signed provenance, so a number can be traced to the query and the source it came from. When a high-stakes answer crosses more than one of these domains, more than one brain has to assent before it is released.

Determinism is what makes the whole thing auditable. The same request, in the same context, under the same policy, routes the same way every time. That property is what lets the decision be replayed later rather than merely described.

The highest-stakes actions need a human in the loop

Quorum among machines settles whether an answer is internally consistent. It does not settle whether a human with authority actually sanctioned the action that follows. For the most consequential operations, a second filed application, GB2611903.2, describes voice-gated multi-brain quorum with replay-resistant action composition. It folds three checks into a single gate.

The action is first composed into a canonical payload and reduced to a SHA3 digest. The operator is then issued a liveness challenge whose challenge phrase is drawn from that digest, and the fresh voice utterance is verified against a voice-print template bound to operator-controlled silicon. In parallel, the same composed-action payload is dispatched to a configurable quorum of independent brains. Emission is permitted only when three conditions hold at once: the voice-print verifier accepts the utterance, the utterance digest binds to the action digest, and the brain quorum returns agreement at or above a per-consent-class threshold.

Binding the spoken challenge to the action digest is what makes the authorisation replay-resistant. The captured utterance authorises this payload and no other, so a recording of the operator approving one action cannot be replayed to approve a different one. It also closes the prompt-injection route, where a crafted input tries to redirect the operator's authority toward an action they never intended. The voice does not merely prove a person is present. It proves that person approved precisely this.

An answer an institution can stand behind

A refusal or an authorisation is only as durable as the record of it. Both mechanisms write into the Open Audit Record, the OAR, the trust root beneath the SIOS. The Audit Ledger maintains a causally linked record of every decision: the inputs that produced it, the prior signed decisions that informed it, the brain that produced it and the actor whose signature commissioned it. Each entry is signed under FIPS 204 ML-DSA-65, the post-quantum signature standard, so the chain holds against a future quantum adversary rather than only a present one.

Crucially, the record is verifiable by someone who was not in the room and does not trust the vendor. The OAR is a CBOR-encoded chain, hash-linked under SHA-3-512, that a static browser-resident verifier can replay offline. It walks every hash link, validates every signature against the operator's public key and emits a deterministic verdict: verified, invalid, stale or revoked. There is no server call and no recourse to whoever ran the model. A regulator can take a chain six months later and replay it.

That is the line between a confident guess and an accountable answer. A single model offers a probability and asks to be trusted. Independent brains that must reach quorum before anything is signed, a voice-gated human authorisation for the actions that matter most, and a post-quantum signed record that anyone can replay, together offer something an institution can put its name to. The reliability gap that has kept generative systems out of regulated work is not closed by a more persuasive model. It is closed by refusing to certify what cannot be independently confirmed, and by leaving a record that proves the refusal happened.