DEV Community

Cover image for Containing Agents That Act on Their Own
Micky Irons
Micky Irons

Posted on • Originally published at mickai.co.uk

Containing Agents That Act on Their Own

Containing Agents That Act on Their Own

The agent that does not wait for you

The thing that changed in the last year is not that artificial intelligence got smarter. It is that we started letting it act. An agent now calls tools, moves data, sends messages, and increasingly hands work to other agents, without a person approving each step. Gartner expects a substantial share of enterprise applications to embed task-specific agents by the end of 2026. That is a productivity story and a containment problem at the same time, and most deployments have built the first without the second.

The risk that keeps defenders awake is not a single bad answer. It is reach. An agent that can invoke other agents has turned one compromised instruction into a chain. Prompt injection is not a defect you patch out, it is a structural property of systems that mix instructions and data in natural language, so you have to assume that sooner or later an agent will be told to do something it should not. The question is not whether that happens. It is how far it gets before you see it, and whether you can stop it and prove it afterwards.

Containment, not just alignment

Most of the public conversation is about aligning models, training them to behave. Alignment matters, but it is probabilistic, and a probabilistic control is the wrong foundation for a system that can act in the world. Even a perfectly aligned agent can be hijacked at runtime. So the defence cannot rest on the agent behaving. It has to rest on the system around the agent, which you can make deterministic.

Containment is that system. It is the discipline of designing the blast radius in advance: deciding what an agent is allowed to touch, making every action it takes visible, and keeping a hand on a switch that stops it cold. Containment is not a property of the model. It is an engineering property of the operating environment, and unlike alignment it does not degrade when the model is fooled.

Marble Cerberus chained at a gate, symbolising containment of autonomous agents

Containment is the discipline of designing the blast radius before the agent acts.

You cannot contain what you cannot see

The first primitive of containment is observability. Every action an agent takes, every tool call, every data access, every message it passes to another agent, has to be recorded completely, at the moment it happens. This is where most systems already fail. Application logs are written after the fact, they sit on infrastructure the operator controls, and they can be incomplete or quietly edited. If an agent does something you never recorded, you cannot detect it, and a chain of agent-to-agent activity that leaves no trace is invisible by construction.

So the bar is higher than logging. You need a complete, tamper-evident account of what the system did, captured as it acts rather than reconstructed later. Without that, anomaly detection has nothing to compare against, and incident response is guesswork.

You must be able to pull the cable

The second primitive is interruptibility. A contained agent is one you can stop instantly and completely, including cutting it off from the network and from the other agents it might otherwise reach. This is the old air-gap discipline applied to autonomy: the operator holds a kill switch that does not depend on the agent's cooperation, and that switch works even if the agent is mid-task and unwilling. An agent you cannot interrupt on demand is not contained. It is loose, and you are hoping.

Interruptibility is also why sovereignty matters here in a concrete, unglamorous way. If the intelligence runs on infrastructure someone else owns, the switch is theirs, not yours. Containment requires that the operator, the party who carries the liability, holds the off switch on their own hardware.

A marble hand on a bronze lever, the operator's kill switch

Interruptibility means the operator holds the switch, and it works without the agent's cooperation.

Detection and proof need a record that cannot be rewritten

The third primitive is provability. To detect that an agent has started doing more than it should, triggering far more activity than expected, or reaching agents and systems outside its remit, you compare live behaviour against a trustworthy record. And after an incident, to show a regulator, an insurer, or a court what actually happened, that record has to be one the other party can verify without trusting you. A log you are able to edit proves nothing, because opposing counsel only has to argue that you could have changed it.

That is a hard requirement, and it rules out almost everything in common use. The record has to be signed at the moment of the act, sealed so that any later tampering is detectable, and verifiable independently by someone who trusts nothing about the vendor. Containment without provable evidence is containment you cannot stand behind when it is tested.

How I built containment into Mickai

This is the problem I built Mickai to solve, and the three primitives are the design. Mickai is a Sovereign Intelligence Operating System: the intelligence runs on hardware the operator owns, so the kill switch is theirs. Every action an agent takes is signed before it executes, not after, into an append-only, hash-chained ledger called the Open Audit Record. The signatures are post-quantum, using the United States National Institute of Standards and Technology standard FIPS 204 ML-DSA-65, and the record can be verified offline by a verifier that runs inside an ordinary web browser, with no network connection and no call home to me. The audit root anchors to the Pantheon sovereign Layer 1 blockchain and onward to Bitcoin, so the timeline cannot be quietly rewritten after an incident. Observability, interruptibility, provability, as engineering rather than as a promise. The design is protected by 101 filed United Kingdom patent applications owned by Mickai LTD.

The three questions to ask of any agent

Before you let an agent act on its own, ask three questions about it. Can I see every action it takes, completely and as it happens? Can I stop it instantly, on my own hardware, without its cooperation? Can I prove to someone who does not trust me exactly what it did? If the answer to any of the three is no, the agent is not contained, however well it has behaved so far.

Autonomy is not the same as capability. An agent you cannot see, cannot stop, and cannot account for is not a more capable system. It is a larger exposure wearing the costume of one. Containment is what turns autonomy back into something you can actually own.

Top comments (0)