Second in a series on building an autonomous AI organism that operates real multi-tenant infrastructure under a constitutional safety model. The first part was about two gates — a conscience and a council. This one is about the wall behind them.
My agent runs infrastructure for more than one organization. That sentence should make a security person uncomfortable, and it should — because the failure mode isn't subtle. The nightmare isn't the agent doing something clever and wrong. It's the agent doing something mundane and right — writing a ticket, rotating a secret, posting a status — to the wrong tenant.
Customer A's data ending up in Customer B's system isn't a bug you patch. It's a breach you disclose.
So the first question I had to answer wasn't "how do I make the agent capable across tenants." It was: how do I make crossing a tenant boundary not a thing the agent can do wrong, because it's not a thing it can do at all.
Permission is the weak version. Absence is the strong one.
The instinct everyone reaches for first is permissions. Give the agent a list of what it's allowed to touch, check every action against it, deny the rest. Role-based access, a policy file, a gate.
Permission gates fail in one specific, fatal way: they assume the thing being asked for exists and you just have to say no. The agent forms an intention to touch Customer B, the gate evaluates it, the gate denies it. That works right up until the gate has a bug, a stale rule, a missing case — and then the intention sails through, because the resource was right there, reachable, waiting for a yes.
The stronger model is that Customer B's resources are structurally absent, not forbidden. In a session scoped to Customer A, the agent doesn't have a denied path to Customer B. It has no path. The credentials aren't loaded. The endpoints aren't in its map. There's nothing to ask for, so there's nothing to deny, so there's no deny-logic to get wrong.
Forbidden is a fact about a rule. Absent is a fact about the world. Rules have bugs; the world doesn't.
Concretely: capabilities are minted per session, scoped to the active organization, and they simply don't include anyone else. The boundary isn't enforced at decision time. It's enforced at existence time.
The trap: secrets aren't the boundary. Endpoints are.
Here's where I was wrong for longer than I'd like to admit, and where I think a lot of people are quietly wrong.
I had a secrets manager. Per-org tokens, policies denying cross-org paths, the whole thing. I told myself: secrets are isolated, therefore tenants are isolated. Clean. Done.
It isn't done. When I put this design through the idea-gate — the council from part one — one of the models put a finger exactly on the gap, and it was sharp enough that I still quote it:
A secrets manager isolates secrets. It does not isolate endpoints.
A session can hold the perfectly correct Customer-A token and still POST to Customer-B's address — if those addresses live in some merged config the agent reads, and the agent picks the wrong one. The credential was right. The destination was wrong. Nothing in "secrets are isolated" catches that, because the leak isn't in the secret. It's in the routing.
And it gets worse, because the routing metadata is itself sensitive. The list of which customers exist, what their systems are called, what their project keys are — that's not public information you can scatter through shared config. The map is part of the secret.
The fix is an invariant, not a daemon
My first instinct for the fix was a central dispatcher — one privileged service all actions funnel through, that checks tenant alignment. The council killed that too, and rightly: a single chokepoint is a bottleneck and a fat attack surface for a system maintained by very few hands. (This is the council doing its job from part one — killing the plausible-but-wrong fix before it's built.)
What survived was smaller and meaner. An invariant, not a service:
Every external resource is bound as one inseparable record: resource → (endpoint, credential, owning-tenant). You cannot get the address without getting the owner in the same breath. And the one library that performs any outbound action refuses if the record's tenant doesn't match the session's tenant.
You can't hardcode your way around it, because there's no loose endpoint to hardcode — the address only exists welded to its owner. The wrong-tenant write isn't denied. It's unrepresentable.
That's the whole philosophy in one move: don't add a check that says no. Remove the shape that would have needed checking.
Two more layers sit behind it, because one wall is never a wall:
- No bypass at the tool level. A pre-execution hook blocks raw outbound calls — the agent can't shell out to a generic HTTP tool and route around the dispatcher. The safe path isn't the polite default; it's the only one wired up.
- Egress on a leash. Each session can only talk to the addresses its tenant allows. A hardcoded address from the wrong tenant doesn't get a connection refused at the application layer — it gets no route at all.
Structural isolation, then a bypass block, then egress scoping. Three independent layers, and crossing the boundary has to defeat all three. No single bug opens the door.
The plot twist: this wall fails closed — and that contradicts everything I said last time
If you read part one, you caught me insisting the agent's conscience is fail-open: when the safety reflex is unsure, it lets the action through, because a system that freezes on every doubt gets ripped out. Viability before safety.
So why, here, am I building walls that fail closed — where if the organism can't positively confirm which tenant it's acting for, it does nothing at all? An unscoped session gets zero external writes. Not "probably fine, proceed." Zero.
That looks like a flat contradiction. It isn't — and untangling it is the actual lesson of this piece.
They're different axes, and they get opposite defaults.
- For actions — is this command safe to run? — the default is yes, proceed. Uncertainty resolves toward motion, because an organism that can't act isn't an organism.
- For tenant boundaries — whose data is this? — the default is no, stop. Uncertainty resolves toward stillness, because acting on the wrong tenant is the one mistake with no undo.
Fail-open keeps the organism alive. Fail-closed keeps it from killing someone else. A mature system isn't uniformly cautious or uniformly bold — it knows which dimension it's standing on and picks the default that dimension demands.
The newest place this showed up: I'm prototyping a layer that lets the agent run code over its own knowledge base to answer questions plain retrieval can't. Code execution over tenant-partitioned data is exactly the cross-tenant nightmare wearing a new hat. The non-negotiable constraint, before a line was written: the code runs under an unforgeable, tenant-scoped, read-only capability that fails closed. The generated code cannot name a tenant, an ID, or a credential — those are bound server-side and never taken from anything the model typed. Same wall. New room.
Why this matters beyond my setup
Multi-tenant is the default shape of real infrastructure work. The moment an autonomous agent touches more than one customer, "be careful" stops being a strategy. Careful is a property of decisions, and decisions have bugs.
The questions worth asking about any agent let loose on multi-tenant systems aren't about capability:
- When it acts on the wrong tenant, what stops it — a rule that has to fire correctly, or a wall that was never bridged?
- Are your boundaries forbidden (a check you maintain) or absent (a shape that doesn't exist)?
- Does the system know the difference between "unsure if this is safe" (proceed) and "unsure whose data this is" (stop cold)?
Capability is the part everyone races to build. Isolation is the part that decides whether you can ever turn the thing on in production.
The safest boundary isn't the one the agent is told not to cross. It's the one it can't.
Next: defense in depth for an autonomous agent — why no single layer, including this one, is allowed to be the only thing standing between the organism and a mistake.
Top comments (0)