Why my AI agents can write code but can't ship it

#ai #agents #machinelearning #devops

Last month an agent finished a content update at 2am, wrote the diff, ran the pre-deploy checks, and then stopped. It filed a request and went idle. The deploy didn't happen until morning, when the Librarian process ran its scheduled verification and shipped it.

That pause was not a bug. I built it.

The capability I withheld

Every agent in my operation at aienterprise.dk has file write access to its workspace. It can read the database, call external APIs, generate and modify content. What it cannot do is push to production. The pm2 reload command, the deploy script, the snapshot promoter, none of them are in scope for any agent except the one process I have designated as deploy authority.

This is not about distrust. The agents' code is usually fine. The issue is risk asymmetry.

A wrong file write gets caught in the next review cycle. A wrong production deploy is live the moment it runs. Those are not the same failure mode and they should not have the same access model.

I closed that gap after an agent shipped a schema migration to the wrong site instance because it matched on name prefix instead of full identifier. Nobody was harmed, rollback took four minutes, but the path was clearly wrong. An agent that builds a thing should not also be the one that decides when the thing ships.

What the gate actually looks like

The mechanism is simple on purpose. When an agent finishes deployable work, it calls a single script: request-deploy.mjs. The script takes a surface, an intent string, and the artifact ID. That's it. The agent's job is done.

A separate process, the Librarian, holds the actual deploy token. It runs on a 15-minute heartbeat plus an autorun trigger when a request lands. It checks whether the new snapshot conflicts with anything else in flight, runs pre-deploy verification, ships all six sites in lockstep, bumps the version with the intent string as the changelog bullet, and records the deploy to the runtime log.

The agent never interacts with the Librarian directly. The separation is not there to create friction. It's there to ensure the thing that builds is never the thing that ships, with no exceptions baked in at the agent layer.

If a deploy is genuinely blocked, the Librarian escalates. I get an alert. I resolve it. But the system does not let an agent work around the gate by claiming urgency.

Why this is worth thinking about now

The EU AI Act's December 2027 deadline is fixed. Danish operators running agentic systems in employment, critical infrastructure, or migration contexts have a planning horizon they can work backward from. The draft Article 6 guidelines define what high-risk means in practice, and the answer is broader than most builders expect.

But the reason to build approval gates isn't the regulation. It's that production systems break in ways demos don't. An agent that can both write and ship is a system where the blast radius of a wrong decision is unbounded on one axis. That is the axis you want to control before something goes wrong, not after.

I withheld deploy capability from my agents because the boundary between built and shipped is where accountability lives. If an agent builds and ships and something breaks, the audit trail is harder to read. If an agent builds and a separate process ships after verification, every step is logged and attributable.

That's the governance mechanism. It's not a policy document. It's an architecture decision that makes the policy enforceable.