Artem Matviychuk

Posted on Jun 18 • Edited on Jul 10 • Originally published at Medium

The Safest Boundary Is the One the Agent Can't Reach Across

#ai #security #devops #machinelearning

Second in a series on building an autonomous AI organism that operates real multi-tenant infrastructure under a constitutional safety model. The first part was about two gates — a conscience and a council. This one is about the wall behind them.

My agent runs infrastructure for more than one organization. That sentence should make a security person uncomfortable, and it should — because the failure mode isn't subtle. The nightmare isn't the agent doing something clever and wrong. It's the agent doing something mundane and right — writing a ticket, rotating a secret, posting a status — to the wrong tenant.

Customer A's data ending up in Customer B's system isn't a bug you patch. It's a breach you disclose.

So the first question I had to answer wasn't "how do I make the agent capable across tenants." It was: how do I make crossing a tenant boundary not a thing the agent can do wrong, because it's not a thing it can do at all.

Permission is the weak version. Absence is the strong one.

The instinct everyone reaches for first is permissions. Give the agent a list of what it's allowed to touch, check every action against it, deny the rest. Role-based access, a policy file, a gate.

Permission gates fail in one specific, fatal way: they assume the thing being asked for exists and you just have to say no. The agent forms an intention to touch Customer B, the gate evaluates it, the gate denies it. That works right up until the gate has a bug, a stale rule, a missing case — and then the intention sails through, because the resource was right there, reachable, waiting for a yes.

The stronger model is that Customer B's resources are structurally absent, not forbidden. In a session scoped to Customer A, the agent doesn't have a denied path to Customer B. It has no path. The credentials aren't loaded. The endpoints aren't in its map. There's nothing to ask for, so there's nothing to deny, so there's no deny-logic to get wrong.

Forbidden is a fact about a rule. Absent is a fact about the world. Rules have bugs; the world doesn't.

Concretely: capabilities are minted per session, scoped to the active organization, and they simply don't include anyone else. The boundary isn't enforced at decision time. It's enforced at existence time.

The trap: secrets aren't the boundary. Endpoints are.

Here's where I was wrong for longer than I'd like to admit, and where I think a lot of people are quietly wrong.

I had a secrets manager. Per-org tokens, policies denying cross-org paths, the whole thing. I told myself: secrets are isolated, therefore tenants are isolated. Clean. Done.

It isn't done. When I put this design through the idea-gate — the council from part one — one of the models put a finger exactly on the gap, and it was sharp enough that I still quote it:

A secrets manager isolates secrets. It does not isolate endpoints.

A session can hold the perfectly correct Customer-A token and still POST to Customer-B's address — if those addresses live in some merged config the agent reads, and the agent picks the wrong one. The credential was right. The destination was wrong. Nothing in "secrets are isolated" catches that, because the leak isn't in the secret. It's in the routing.

And it gets worse, because the routing metadata is itself sensitive. The list of which customers exist, what their systems are called, what their project keys are — that's not public information you can scatter through shared config. The map is part of the secret.

A war-story added since publishing, because reality made this point better than I did. A whole cluster of services authenticated through a redundant set of directory servers — several of them, deployed precisely so no single one going down could take auth with it. The redundancy was real. What wasn't real was the use of it: over time, the endpoint each service pointed at had drifted across a scatter of separate configs until, quietly, most of them named the same single server. Nobody decided that. No config said "depend on exactly one." The dependency assembled itself out of five locally-reasonable choices. Then that one server went dark behind a provider outage, and half the estate lost authentication at once — while its healthy redundant peers sat there, unused. The credentials were flawless the whole time; every service held the right token. The break was entirely in where the token was sent, and in the fact that the endpoint could drift independently of everything built to make it safe. The thesis of this section, delivered by an outage instead of a diagram: a correct secret pointed at a drifted endpoint is a breach — or an outage — waiting for a bad day. Secrets were never the boundary. The binding of credential-to-endpoint is.

The fix is an invariant, not a daemon

My first instinct for the fix was a central dispatcher — one privileged service all actions funnel through, that checks tenant alignment. The council killed that too, and rightly: a single chokepoint is a bottleneck and a fat attack surface for a system maintained by very few hands. (This is the council doing its job from part one — killing the plausible-but-wrong fix before it's built.)

What survived was smaller and meaner. An invariant, not a service:

Every external resource is bound as one inseparable record: resource → (endpoint, credential, owning-tenant). You cannot get the address without getting the owner in the same breath. And the one library that performs any outbound action refuses if the record's tenant doesn't match the session's tenant.

You can't hardcode your way around it, because there's no loose endpoint to hardcode — the address only exists welded to its owner. The wrong-tenant write isn't denied. It's unrepresentable.

That's the whole philosophy in one move: don't add a check that says no. Remove the shape that would have needed checking.

A reader of the first version pushed the invariant one turn further, and the refinement is worth stealing. The record isn't just (endpoint, credential, owner) — the property that actually protects you is that none of its parts can drift independently of the others. The moment any one of them can move on its own — the endpoint lives in a shared config, the allowed operation gets widened by a flag, the owner is inferred instead of bound — the wall silently degrades back into a preference, and you are one drift away from the outage two paragraphs up. Bind the operation in too: (credential, endpoint, owner, allowed-operation), all four or nothing.

The mental model that finally made it click is one every engineer already trusts: a signed URL. A signed URL welds a resource, an operation, a credential, and an expiry into a single artifact you cannot take apart — you can't keep the signature and swap the object, or hold the link past its expiry. Nobody re-checks a permission table at request time; the capability is the permission, unforgeable and self-expiring by construction. What an autonomous agent needs is signed-URL semantics for every action it can take — not just object storage — so authority always arrives as one inseparable, expiring bundle instead of a constellation of separately-managed configs that must all stay in sync forever. They will not stay in sync. Drift is the default state of infrastructure. Build the boundary so drift is impossible, not merely discouraged.

Two more layers sit behind it, because one wall is never a wall:

No bypass at the tool level. A pre-execution hook blocks raw outbound calls — the agent can't shell out to a generic HTTP tool and route around the dispatcher. The safe path isn't the polite default; it's the only one wired up.
Egress on a leash. Each session can only talk to the addresses its tenant allows. A hardcoded address from the wrong tenant doesn't get a connection refused at the application layer — it gets no route at all.

Structural isolation, then a bypass block, then egress scoping. Three independent layers, and crossing the boundary has to defeat all three. No single bug opens the door.

The plot twist: this wall fails closed — and that contradicts everything I said last time

If you read part one, you caught me insisting the agent's conscience is fail-open: when the safety reflex is unsure, it lets the action through, because a system that freezes on every doubt gets ripped out. Viability before safety.

So why, here, am I building walls that fail closed — where if the organism can't positively confirm which tenant it's acting for, it does nothing at all? An unscoped session gets zero external writes. Not "probably fine, proceed." Zero.

That looks like a flat contradiction. It isn't — and untangling it is the actual lesson of this piece.

They're different axes, and they get opposite defaults.

For actions — is this command safe to run? — the default is yes, proceed. Uncertainty resolves toward motion, because an organism that can't act isn't an organism.
For tenant boundaries — whose data is this? — the default is no, stop. Uncertainty resolves toward stillness, because acting on the wrong tenant is the one mistake with no undo.

Fail-open keeps the organism alive. Fail-closed keeps it from killing someone else. A mature system isn't uniformly cautious or uniformly bold — it knows which dimension it's standing on and picks the default that dimension demands.

The newest place this showed up: I'm prototyping a layer that lets the agent run code over its own knowledge base to answer questions plain retrieval can't. Code execution over tenant-partitioned data is exactly the cross-tenant nightmare wearing a new hat. The non-negotiable constraint, before a line was written: the code runs under an unforgeable, tenant-scoped, read-only capability that fails closed. The generated code cannot name a tenant, an ID, or a credential — those are bound server-side and never taken from anything the model typed. Same wall. New room.

Why this matters beyond my setup

Multi-tenant is the default shape of real infrastructure work. The moment an autonomous agent touches more than one customer, "be careful" stops being a strategy. Careful is a property of decisions, and decisions have bugs.

The questions worth asking about any agent let loose on multi-tenant systems aren't about capability:

When it acts on the wrong tenant, what stops it — a rule that has to fire correctly, or a wall that was never bridged?
Are your boundaries forbidden (a check you maintain) or absent (a shape that doesn't exist)?
Does the system know the difference between "unsure if this is safe" (proceed) and "unsure whose data this is" (stop cold)?

Capability is the part everyone races to build. Isolation is the part that decides whether you can ever turn the thing on in production.

The safest boundary isn't the one the agent is told not to cross. It's the one it can't.

Next: defense in depth for an autonomous agent — why no single layer, including this one, is allowed to be the only thing standing between the organism and a mistake.

Top comments (5)

Alex Shev • Jun 19

Tenant isolation has to be a wall, not a preference.

The agent can have a conscience, policies, and review steps, but the infrastructure should still make cross-tenant action unreachable by default. If the safest boundary depends on the agent remembering the rule, it is not really the safest boundary.

Artem Matviychuk • Jun 19

Exactly — "a wall, not a preference" is a sharper one-liner than anything in the post, I might steal it. That's the whole "structurally absent, not forbidden" idea: a forbidden action is a fact about a rule, and rules have bugs; an absent one is a fact about the world — the path was never wired. The agent can't remember to not cross a boundary that doesn't exist for it.

One trap I'd add on top: even with the wall, watch the endpoints, not just the secrets. A perfectly correct per-tenant credential still lets you POST to the wrong tenant's address if those addresses live in shared config. The wall has to bind credential + endpoint + owner as one inseparable record — otherwise "isolated secrets" quietly isn't "isolated tenants."

Alex Shev • Jun 21

That endpoint point is the part people usually miss. Tenant isolation is not just "the secret is scoped correctly"; the destination has to be scoped with it.

I like thinking of it as one bound capability: credential + endpoint + owner + allowed operation. If any of those can drift independently, the wall quietly turns back into a preference.

Artem Matviychuk • Jul 9

"If any of those can drift independently, the wall quietly turns back into a preference" — that's exactly it, and drift is the operative word. A capability that was correct at issue time isn't automatically correct at use time; something has to re-verify the binding when it's exercised, not just when it's minted. Otherwise you're trusting a snapshot.

One more dimension I'd add to the tuple after the last few weeks: how much, and for how long. Credential + endpoint + owner + operation bounds where an action can land — but not how many times, or at what cost. A perfectly scoped capability still lets a stuck loop hammer a legitimate endpoint all day. Attaching expiry and budget to the grant itself (a self-expiring capability, not a separate quota system someone has to remember to configure) turned out to matter as much as the scoping. That failure mode grew into the whole part 4, published today: dev.to/artemmatviychuk/the-most-da...

The closest mainstream analogues are GCS/S3 signed URLs and short-lived tokens (STS, OAuth access tokens, Vault leases): scope, allowed operation and expiry welded into a single artifact. Nothing in it can drift independently — there's nothing to configure separately, and revocation is the default outcome, not an action someone has to remember. In a sense, agent infrastructure needs signed-URL semantics for every action, not just object storage.

Curious how you'd slice it: does rate/budget belong inside the capability record, or in a separate enforcement layer?

Alex Shev • Jul 9

I would put budget/rate inside the capability record, with enforcement in the runtime. If it lives only in a separate quota layer, it becomes another thing that can drift away from the permission grant. The grant should say what, where, who, how long, and how much.