I have been thinking a lot about coding agents lately.
Not really about whether they can write good code, because usually they can, sometimes they can't. That part is obvious. But the risk is shifting from wrong answers to wrong outcomes.
The part that feels more important to me is this:
should the agent actually own the write authority?
We already don't trust humans without roles, limits, reviews, and accountability. Developers use PRs, pilots use checklists, bank clerks have transfer limits. Capable agents need the same structure, but machine-readable.
Right now a lot of setups still look roughly like this:
- agent reads the repo
- agent decides what to change
- agent has a GitHub token
- agent creates commits, branches, or PRs
I don't think this is the right default.
The agent can reason.
The agent can inspect files.
The agent can propose changes.
But the moment it can directly create external impact, the problem changes.
It is no longer just:
did the agent say something wrong?
It becomes:
did the agent create the wrong outcome?
That is a much more expensive failure mode.
Intent is not authority
The pattern I like more is simple:
- agent reads directly
- agent proposes intent
- a boundary decides
- an adapter materializes only admitted work
So the agent does not get the write credentials.
It submits a structured intent instead, which could look like:
{
"operation": "write",
"target": {
"repo": "example/app",
"branch": "main",
"path": "docs/config/agent-policy.md"
},
"source_state": {
"blob_sha": "8f31c2..."
},
"requested_effect_hash": "sha256:..."
}
This is then not a command anymore, it is a suggestion, or an intent.
The system still has to decide whether this proposed outcome should exist.
That decision layer can check things like:
- is this actor allowed?
- is this repo allowed?
- is this path in scope?
- does the source state still match?
- is this operation allowed?
- was the same effect already created?
- should this become a reviewable PR?
Only after that should there be an outcome.
For example:
{
"decision": "admitted",
"checks": {
"scope": "pass",
"source_state": "pass",
"policy": "pass",
"idempotency": "pass"
},
"outcome": {
"type": "pull_request",
"status": "created",
"reviewable": true
}
}
The core rule is:
No impact without admission.
The flow would look like this:
This is not the same as a sandbox
A sandbox is useful.
But I think it solves a different problem.
A sandbox asks:
- where can the agent run?
- can it use the network?
- can it execute commands?
- which files can it access?
- can it escape the environment?
A gateway asks:
should this concrete proposed outcome exist?
That difference matters because a sandbox can stop escape, it does not decide whether a proposed outcome should exist.
If the agent has a valid GitHub token inside the allowed environment, it can still use allowed tools to create an unwanted result.
The action can be technically allowed and still be the wrong outcome.
That is why I think the boundary should sit between intent and impact, not only around execution.
Sandbox isolates execution.
Gateway isolates impact.
Why GitHub is a good first target
GitHub already has a good human pattern:
a change proposal is not a merge.
Pull Requests are familiar because they are reviewable and they fit how developers already work.
But with agents there is one step before the PR that also matters:
An agent proposal should not automatically become PR impact.
A PR is already a real side effect.
- It creates a branch.
- It creates commits.
- It creates review work.
- It changes the state of the repository.
So the agent should not directly create it with its own write token.
The flow I want is more like this:
- agent reads repository
- agent submits structured intent
- gateway checks state, scope, policy, and idempotency
- GitHub adapter creates a reviewable PR only after admission
- PR contains evidence about the decision
The adapter is not the authority, it only materializes admitted work.
And the agent never receives the GitHub write credentials.
This does not make the code correct
This is important:
- A boundary like this does not prove that the generated code is good.
- It does not replace CI.
- It does not replace human review.
- It does not prove semantic correctness.
- It only controls the transition from proposed work to external impact.
That narrower claim is the whole point. I think many agent systems mix three things together:
- reasoning
- decision
- impact
But these should be separated:
- The agent owns reasoning.
- The boundary owns the decision.
- The adapter owns controlled materialization.
- The target system should only receive admitted impact.
Why I care about this
I don't think production agent systems will be trusted just because the models get smarter. They will be trusted when the path from agent work to external change becomes explicit.
For every real outcome, I want to be able to ask:
- what did the agent propose?
- what state did it read?
- which rules were checked?
- why was it admitted or blocked?
- what outcome was created?
- can a human review it?
- can we audit it later?
That is the layer I have been working on with Impact Boundary Labs.
The first implementation is GitHub-first:
agents can read repositories directly, but write intents go through a deterministic gateway that creates reviewable Pull Requests with evidence.
GitHub is not the whole idea, it is just the first concrete place to prove the pattern, because repositories have clear state, branches, commits, PRs, and review.
The broader principle is:
Let agents reason.
Stop them at intent.
Control what becomes outcome.
Project: Impact Boundary Labs
This is my very first article here on dev.to! Iād love to hear your thoughts on this architecture. How are you currently securing your agent workflows?
Since I'm new here, I'm highly open to feedback - let me know in the comments what I can improve or what we should talk about in Part 2!

Top comments (5)
Strong principle and it's the right hill to die on - the agent should produce a proposal (a diff, a PR, a plan), and a separate trusted system with the write credential applies it after a gate. The moment a coding agent holds the push/deploy/db credential directly, a hallucination or an injected instruction becomes a destructive write with no checkpoint, and you've handed god-mode to the least predictable component in the stack. Separating "who proposes the change" from "who has the authority to commit it" is exactly the privilege boundary that keeps an autonomous agent from being a liability.
This is core to how I build - propose with the model, commit with a gated trusted layer, never let the generator hold the keys. It's the spine of Moonshift, the thing I work on: a multi-agent pipeline that takes a prompt to a deployed SaaS, where agents emit changes and a verify layer (with the actual credentials) decides what gets applied, so a bad generation can't directly mutate prod. Same separation you're arguing for. Multi-model routing keeps a build ~$3 flat, first run free no card. This take needs to be louder. Where do you put the boundary in practice - agent opens a PR and CI/human merges, or a broker service that holds creds and applies only approved diffs? The broker pattern is the cleanest I've found but it's more to build.
Thanks Harjot, yes. In practice we put the boundary on the broker side, before the PR exists.
I would not want the agent to open the PR directly and then rely only on CI or a human merge. CI and human review are still useful, but they are later gates. If the agent can already create branches, push commits, or spam PRs, it still has more write power than I want it to have.
So the model is closer to your second thought: the agent proposes the change, but the broker/core holds the credentials. The Core checks policy, scope, current state, and drift. If the proposal is admitted, an adapter materializes the real effect, for example by creating a PR.
The values we try to keep are pretty simple:
A blocked action should not just disappear as a failed tool call. It should tell the agent what was missing, why it was blocked, or what needs to happen next.
So yes, the broker approach is more work, but I think it is the cleaner place to put responsibility once agents start touching real systems.
Broker-side is the cleanest place for it, the credential lives with the trusted applier, the agent only ever emits a proposal, so even a fully-compromised agent can't push directly. That's the exact separation I landed on too: generator never holds the keys, a gated layer with the creds decides what actually applies. Broker is more to build than "agent opens a PR" but it's the version that holds up when the agent goes off the rails. Good design, and good of you to spell it out publicly, this pattern is badly under-discussed relative to how fast people are handing agents write access.
That's the right place for it. The whole point is the agent never holds a credential it could misuse, it asks a broker to perform a scoped action and the broker enforces policy, so a compromised or confused agent can't do more than the broker allows. Putting the boundary at the agent ("please be careful with these keys") is hope, not a control. Broker-mediated, least-privilege, every action attributable is the only model that survives an agent going off the rails. I care about this a lot in Moonshift since agents touch real repos and deploys: the agent proposes, a gated layer that actually holds the permissions executes. Are you scoping the broker per-action, or per-session with a capability token that expires?
Jup, exactly. I think this should be per intent / per action, not a broad agent session.
A session can be useful for auth or communication, but it should not become a temporary impact window.
The agent proposes a certain effect, the core/policy decides if that effect is allowed, and only then the adapter executes the actual action.
We are also thinking about the read side here: scoped reads, so the intent is bound to the repo state the agent was actually allowed to observe
I will address this more clearly in the follow up article, because this distinction is the most important one. Thanks for pushing on this, really appreciate the input!