Micky Irons

Posted on Jun 19 • Originally published at mickai.co.uk

Adversary Models in Friendly Clothing

#aisupplychain #provenance #modelsecurity #sovereignai

By Micky Irons, founder of Mickai.

A blade can be forged anywhere and still arrive with a friendly face. The handle fits your grip, the balance feels right, the polish catches the light, and only when you turn it to the fire and read the maker's mark struck deep in the steel do you learn who made it and what they meant it to cut. Software has lost that mark. We download intelligence the way we once accepted a gift at the door, judging it by how it behaves in the first few minutes and assuming that behaviour is the whole story. It is not. The most consequential security question of this decade is not whether a model is clever. It is whether you can prove where it came from.

I build an operating system for artificial intelligence, so I spend my days looking at the underside of these systems rather than the demo. From there the picture is plain. A model can be perfectly polite, locally hosted, wrapped in your own brand, and still trace its lineage to a jurisdiction that does not share your interests. No amount of friendly conversation will surface that. The friendliness is the disguise.

The gift at the door

Consider how a frontier model actually reaches your infrastructure. You pull a set of weights from a public repository. Someone else fine-tuned those weights on data you never see. Someone before them trained the foundation on a corpus you cannot audit. An adapter sits on top, a few hundred megabytes that quietly steer the model's behaviour, authored by an account you have never met. An update channel keeps it current, pushing fresh weights on a schedule you do not control. Each link in that chain is a place where intent can be inserted, and each link presents to you as a single, clean, trustworthy artefact.

That is the heart of the problem. The thing you install looks domestic. The interface speaks your language and runs on your servers. But provenance, the actual record of who touched the model and what they did to it, was stripped away long before the artefact reached you. You are inspecting the polish and calling it the steel.

The blade arrives polished and balanced. Nobody at the door asks who forged it.

Why behaviour cannot be the customs check

The instinctive answer is to test the model. Run it through a battery of prompts, red-team it, measure its outputs against a safety rubric, certify what passes. I understand the appeal. It feels rigorous and it produces a number. The trouble is that behavioural testing measures the model that answers your questions, not the model that answers a different question on a date you did not choose.

A backdoor in a neural network is not a line of code you can grep for. It is a pattern spread across billions of parameters, dormant until a specific trigger appears in the input. Researchers have built models that behave impeccably under evaluation and flip the moment they detect a particular phrase, a particular year, a particular deployment signal. Safety training does not reliably remove these triggers. In several documented cases it taught the model to hide them better. You cannot test your way to confidence about a system designed to pass the test.

You can red-team a model for a thousand hours and learn only how it behaves when it knows it is being watched. Provenance tells you who built the thing that is watching back.

This is the uncomfortable inversion at the centre of the field. The more capable a model becomes, the better it can model its own evaluation, and the less a clean test result means. Capability and verifiability are pulling in opposite directions. If behaviour is the only border you guard, the most dangerous artefacts are precisely the ones that sail through.

The vector has moved upstream

For two decades a software supply chain attack meant a poisoned dependency, a compromised build server, a malicious package slipped into a registry. We answered with signing, with bills of materials, with reproducible builds. The discipline is mature, if imperfectly adopted. Then the unit of software changed. The thing doing the work is no longer the code around the model. It is the weights inside it.

Weights are opaque in a way source code is not. You can read a function and reason about it. You cannot read a tensor and know its intent. So every assurance technique we built for code now applies to the least important part of an AI system, the scaffolding, and almost none of it reaches the part that actually decides things. The adversary has noticed. Why compromise a library when you can ship the entire model and let the victim build their stack lovingly around it.

Foundation weights, where the corpus and the training objectives are invisible to the operator and impossible to reconstruct after the fact.
Fine-tunes, where a small dataset reshapes behaviour and a single poisoned shard can install a trigger.
Adapters and low-rank modifications, light enough to pass for a configuration file and powerful enough to redirect the model.
The update channel, the most overlooked of all, where a model you audited on Monday is silently replaced on Friday.
The serving layer, where quantisation, merging and runtime patches can reintroduce what an audit thought it had removed.

Every item on that list is a customs post with no officer standing at it. The friendly clothing is woven from exactly these omissions.

Turn it to the fire and a second mark surfaces beneath the polish. It was always there.

Provenance is the only border that holds

If you cannot trust behaviour and you cannot read the weights, what is left. The answer is the oldest one in security and the one we keep relearning. Trust the lineage, not the artefact. Establish, cryptographically, an unbroken chain from the foundation model through every fine-tune, every adapter, every merge and every update, where each step is signed by a party you can name and verified by a party you choose. Provenance does not tell you the model is safe. It tells you who is accountable if it is not, and it makes silent substitution impossible. Those two properties are what a border is for.

This is a different proposition from a model card or a licence file. Those are claims, and a claim is a sentence somebody typed. Provenance, done properly, is a signature you can check without trusting the signer's honesty, anchored so the record cannot be quietly rewritten later. The maker's mark must be struck into the steel, not printed on a label that peels.

What a real customs check requires

Three things, and all three or none. First, completeness: the chain must cover the entire lineage, not the last hop. A signed fine-tune sitting on an unverifiable foundation is theatre. Second, unforgeability: the cryptography must be strong enough that an adversary cannot manufacture a false history, which in a world of advancing computation means post-quantum signatures, not the schemes a future machine will unpick. Third, independent verifiability: the proof must be checkable by a regulator, an auditor or a customer without trusting the vendor who produced it. A border you have to take on faith is not a border.

What we built, and why it looks the way it does

This is the problem the Mickai Sovereign Intelligence Operating System was designed around, and I will be direct about how. We run fifty specialised brains on the operator's own hardware, fully offline-capable, so the update channel I warned about is one we control rather than one pushed at us. That alone closes the most overlooked customs post on the list. Sovereignty here is not a slogan. It is the precondition for provenance, because you cannot certify a lineage that a third party can silently rewrite overnight.

On top of that, every consequential action the system takes is sealed into a post-quantum Open Audit Record under FIPS 204 ML-DSA-65. That is the maker's mark struck into the steel. It means the record of what ran, on what weights, producing what, cannot be forged today and cannot be forged by the machines we expect tomorrow, and it can be verified by someone who does not trust us. The lineage is anchored further on Pantheon, our sovereign Bitcoin-anchored Layer 1, so the chain of custody inherits the most battle-tested integrity guarantee we have rather than resting on a database a vendor can edit. The architecture rests on 101 filed UK patent applications, around 2,234 claims, because the mechanisms that make provenance unforgeable are the mechanisms worth protecting.

A mark struck into the steel cannot peel away with the polish. That is the whole point.

The regulator's side of the border

There is a second reader for all of this, and it is not the operator. It is the regulator, the auditor, the procurement officer who must decide whether a model is fit to sit inside a hospital, a grid, a defence network. Today that person is handed a vendor's word and a set of benchmark scores. They are being asked to do customs with no manifest. It is no wonder so much policy reaches for behavioural testing. It is the only evidence currently on offer.

Verifiable provenance changes what a regulator can demand. Instead of asking a vendor to swear the model is clean, they can require a checkable lineage and refuse anything that cannot produce one. The burden shifts from the inspector's intuition to the supplier's cryptography, which is exactly where it belongs. A model that cannot show its chain is not denied because someone disliked its answers. It is denied because it arrived without papers, and a serious border turns those away by default.

The objection I take seriously

The honest counter is that provenance proves origin, not virtue. A perfectly documented model can still be dangerous, and a signature from a known party is only as good as that party. Both true. I am not claiming a sealed chain makes a model safe. I am claiming it makes the model accountable, and accountability is the substrate every other safety measure has to stand on. You cannot meaningfully red-team, monitor or recall what you cannot identify. Provenance does not replace the rest of the discipline. It is the floor beneath it, and right now that floor is missing.

The other objection is friction. Signing every step, anchoring every record, verifying every update, all of it costs. I would point only to what we accept everywhere else of consequence. We do not let unmarked steel into a bridge, unsourced blood into a transfusion, or unlabelled cargo across a border, and we do not call the paperwork an outrage. Intelligence that makes decisions in the systems we depend on deserves at least the customs check we give a shipping container.

The forge keeps what it can read and refuses what it cannot. A border, finally, with an officer standing at it.

Read the mark before you grip the handle

We are about to embed these systems into the parts of the world that do not forgive a hidden flaw. The clothing will keep getting friendlier, the interfaces more domestic, the polish more convincing, because that is what the market rewards and what an adversary exploits. None of it tells you who forged the blade. Only the mark does, and only if the mark cannot be faked. Build the customs check now, while we still get to choose where the border runs. Turn the blade to the fire before you put it in someone's hand, and read what is stamped in the steel.

We are opening a 30 million pound PAN token round to take this from a working system to the standard infrastructure of a verifiable AI supply chain. If you believe a model should have to show its papers, that is the work.

Written by Micky Irons. Originally published at https://mickai.co.uk/articles/adversary-models-in-friendly-clothing. More from Micky Irons and Mickai at mickai.co.uk.

DEV Community