I have been thinking a lot about coding agents lately.
Not really about whether they can write good code, because usually they can, sometimes they can't. That part is obvious. But the risk is shifting from wrong answers to wrong outcomes.
The part that feels more important to me is this:
should the agent actually own the write authority?
We already don't trust humans without roles, limits, reviews, and accountability. Developers use PRs, pilots use checklists, bank clerks have transfer limits. Capable agents need the same structure, but machine-readable.
Right now a lot of setups still look roughly like this:
- agent reads the repo
- agent decides what to change
- agent has a GitHub token
- agent creates commits, branches, or PRs
I don't think this is the right default.
The agent can reason.
The agent can inspect files.
The agent can propose changes.
But the moment it can directly create external impact, the problem changes.
It is no longer just:
did the agent say something wrong?
It becomes:
did the agent create the wrong outcome?
That is a much more expensive failure mode.
Intent is not authority
The pattern I like more is simple:
- agent reads directly
- agent proposes intent
- a boundary decides
- an adapter materializes only admitted work
So the agent does not get the write credentials.
It submits a structured intent instead, which could look like:
{
"operation": "write",
"target": {
"repo": "example/app",
"branch": "main",
"path": "docs/config/agent-policy.md"
},
"source_state": {
"blob_sha": "8f31c2..."
},
"requested_effect_hash": "sha256:..."
}
This is then not a command anymore, it is a suggestion, or an intent.
The system still has to decide whether this proposed outcome should exist.
That decision layer can check things like:
- is this actor allowed?
- is this repo allowed?
- is this path in scope?
- does the source state still match?
- is this operation allowed?
- was the same effect already created?
- should this become a reviewable PR?
Only after that should there be an outcome.
For example:
{
"decision": "admitted",
"checks": {
"scope": "pass",
"source_state": "pass",
"policy": "pass",
"idempotency": "pass"
},
"outcome": {
"type": "pull_request",
"status": "created",
"reviewable": true
}
}
The core rule is:
No impact without admission.
The flow would look like this:
This is not the same as a sandbox
A sandbox is useful.
But I think it solves a different problem.
A sandbox asks:
- where can the agent run?
- can it use the network?
- can it execute commands?
- which files can it access?
- can it escape the environment?
A gateway asks:
should this concrete proposed outcome exist?
That difference matters because a sandbox can stop escape, it does not decide whether a proposed outcome should exist.
If the agent has a valid GitHub token inside the allowed environment, it can still use allowed tools to create an unwanted result.
The action can be technically allowed and still be the wrong outcome.
That is why I think the boundary should sit between intent and impact, not only around execution.
Sandbox isolates execution.
Gateway isolates impact.
Why GitHub is a good first target
GitHub already has a good human pattern:
a change proposal is not a merge.
Pull Requests are familiar because they are reviewable and they fit how developers already work.
But with agents there is one step before the PR that also matters:
An agent proposal should not automatically become PR impact.
A PR is already a real side effect.
- It creates a branch.
- It creates commits.
- It creates review work.
- It changes the state of the repository.
So the agent should not directly create it with its own write token.
The flow I want is more like this:
- agent reads repository
- agent submits structured intent
- gateway checks state, scope, policy, and idempotency
- GitHub adapter creates a reviewable PR only after admission
- PR contains evidence about the decision
The adapter is not the authority, it only materializes admitted work.
And the agent never receives the GitHub write credentials.
This does not make the code correct
This is important:
- A boundary like this does not prove that the generated code is good.
- It does not replace CI.
- It does not replace human review.
- It does not prove semantic correctness.
- It only controls the transition from proposed work to external impact.
That narrower claim is the whole point. I think many agent systems mix three things together:
- reasoning
- decision
- impact
But these should be separated:
- The agent owns reasoning.
- The boundary owns the decision.
- The adapter owns controlled materialization.
- The target system should only receive admitted impact.
Why I care about this
I don't think production agent systems will be trusted just because the models get smarter. They will be trusted when the path from agent work to external change becomes explicit.
For every real outcome, I want to be able to ask:
- what did the agent propose?
- what state did it read?
- which rules were checked?
- why was it admitted or blocked?
- what outcome was created?
- can a human review it?
- can we audit it later?
That is the layer I have been working on with Impact Boundary Labs.
The first implementation is GitHub-first:
agents can read repositories directly, but write intents go through a deterministic gateway that creates reviewable Pull Requests with evidence.
GitHub is not the whole idea, it is just the first concrete place to prove the pattern, because repositories have clear state, branches, commits, PRs, and review.
The broader principle is:
Let agents reason.
Stop them at intent.
Control what becomes outcome.
Project: Impact Boundary Labs
This is my very first article here on dev.to! Iād love to hear your thoughts on this architecture. How are you currently securing your agent workflows?
Since I'm new here, I'm highly open to feedback - let me know in the comments what I can improve or what we should talk about in Part 2!

Top comments (0)