David Loibner

Posted on May 30 • Edited on Jul 9

Coding agents should not hold write credentials.

#ai #github #security #architecture

I have been thinking a lot about coding agents lately.

Not really about whether they can write good code, because usually they can, sometimes they can't. That part is obvious. But the risk is shifting from wrong answers to wrong outcomes.

The part that feels more important to me is this:
should the agent actually own the write authority?

We already don't trust humans without roles, limits, reviews, and accountability. Developers use PRs, pilots use checklists, bank clerks have transfer limits. Capable agents need the same structure, but machine-readable.

Right now a lot of setups still look roughly like this: the agent reads the repo, decides what to change, holds a GitHub token, and then creates commits, branches, or PRs.

I don't think this is the right default.

The agent can reason.
The agent can inspect files.
The agent can propose changes.

But the moment it can directly create external impact, the problem changes.

It is no longer just:

did the agent say something wrong?

It becomes:

did the agent create the wrong outcome?

That is a much more expensive failure mode.

Intent is not authority

The pattern I like more is simple: the agent reads directly, proposes intent, and a boundary decides before an adapter materializes admitted work.

So the agent does not get the write credentials.
It submits a structured intent instead, which could look like:

{
  "operation": "write",
  "target": {
    "repo": "example/app",
    "branch": "main",
    "path": "docs/config/agent-policy.md"
  },
  "source_state": {
    "blob_sha": "8f31c2..."
  },
  "requested_effect_hash": "sha256:..."
}

This is then not a command anymore, it is a suggestion, or an intent.
The system still has to decide whether this proposed outcome should exist.

That decision layer can check whether the actor, repo, path, source state, operation, and requested effect are valid for this situation, and whether the result should become a reviewable PR.

Only after that should there be an outcome.

For example:

{
  "decision": "admitted",
  "checks": {
    "scope": "pass",
    "source_state": "pass",
    "policy": "pass",
    "idempotency": "pass"
  },
  "outcome": {
    "type": "pull_request",
    "status": "created",
    "reviewable": true
  }
}

The core rule is:

No impact without admission.

The flow would look like this:

This is not the same as a sandbox

A sandbox is useful.
But I think it solves a different problem.

A sandbox asks where the agent can run, whether it can use the network, whether it can execute commands, which files it can access, and whether it can escape the environment.

A gateway asks:

should this concrete proposed outcome exist?

That difference matters because a sandbox can stop escape, it does not decide whether a proposed outcome should exist.
If the agent has a valid GitHub token inside the allowed environment, it can still use allowed tools to create an unwanted result.
The action can be technically allowed and still be the wrong outcome.

That is why I think the boundary should sit between intent and impact, not only around execution.

Sandbox isolates execution.
Gateway isolates impact.

Why GitHub is a good first target

GitHub already has a good human pattern:

a change proposal is not a merge.

Pull Requests are familiar because they are reviewable and they fit how developers already work.

But with agents there is one step before the PR that also matters:

An agent proposal should not automatically become PR impact.
A PR is already a real side effect. It creates a branch, commits, review work, and changes the state of the repository.

So the agent should not directly create it with its own write token.

The flow I want is more like this: the agent reads the repository, submits structured intent, the gateway checks state, scope, policy, and idempotency, and the GitHub adapter creates a reviewable PR only after admission. The PR should contain evidence about the decision.

The adapter is not the authority, it only materializes admitted work.
And the agent never receives the GitHub write credentials.

This does not make the code correct

This is important:

A boundary like this does not prove that the generated code is good. It does not replace CI, human review, or semantic correctness. It only controls the transition from proposed work to external impact.

That narrower claim is the whole point. I think many agent systems mix reasoning, decision, and impact together.

But these should be separated. The agent owns reasoning. The boundary owns the decision. The adapter owns controlled materialization. The target system should only receive admitted impact.

Why I care about this

I don't think production agent systems will be trusted just because the models get smarter. They will be trusted when the path from agent work to external change becomes explicit.

For every real outcome, I want to know what the agent proposed, what state it read, which rules were checked, why it was admitted or blocked, what outcome was created, whether a human can review it, and whether we can audit it later.

That is the layer I have been working on with Impact Boundary Labs.
The first implementation is GitHub-first:

agents can read repositories directly, but write intents go through a deterministic gateway that creates reviewable Pull Requests with evidence.

GitHub is not the whole idea, it is just the first concrete place to prove the pattern, because repositories have clear state, branches, commits, PRs, and review.

The broader principle is:

Let agents reason.
Stop them at intent.
Control what becomes outcome.

Project: Impact Boundary Labs

This is my very first article here on dev.to! I’d love to hear your thoughts on this architecture. How are you currently securing your agent workflows?

Since I'm new here, I'm highly open to feedback - let me know in the comments what I can improve or what we should talk about in Part 2!

Top comments (5)

Harjot Singh • May 31

Strong principle and it's the right hill to die on - the agent should produce a proposal (a diff, a PR, a plan), and a separate trusted system with the write credential applies it after a gate. The moment a coding agent holds the push/deploy/db credential directly, a hallucination or an injected instruction becomes a destructive write with no checkpoint, and you've handed god-mode to the least predictable component in the stack. Separating "who proposes the change" from "who has the authority to commit it" is exactly the privilege boundary that keeps an autonomous agent from being a liability.

This is core to how I build - propose with the model, commit with a gated trusted layer, never let the generator hold the keys. It's the spine of Moonshift, the thing I work on: a multi-agent pipeline that takes a prompt to a deployed SaaS, where agents emit changes and a verify layer (with the actual credentials) decides what gets applied, so a bad generation can't directly mutate prod. Same separation you're arguing for. Multi-model routing keeps a build ~$3 flat, first run free no card. This take needs to be louder. Where do you put the boundary in practice - agent opens a PR and CI/human merges, or a broker service that holds creds and applies only approved diffs? The broker pattern is the cleanest I've found but it's more to build.

David Loibner • May 31

Thanks Harjot, yes. In practice we put the boundary on the broker side, before the PR exists.

I would not want the agent to open the PR directly and then rely only on CI or a human merge. CI and human review are still useful, but they are later gates. If the agent can already create branches, push commits, or spam PRs, it still has more write power than I want it to have.

So the model is closer to your second thought: the agent proposes the change, but the broker/core holds the credentials. The Core checks policy, scope, current state, and drift. If the proposal is admitted, an adapter materializes the real effect, for example by creating a PR.

The values we try to keep are pretty simple:

No hidden write power inside the agent loop.
Explicit admission before impact.
Deterministic checks where possible.
Useful feedback when something gets blocked.

A blocked action should not just disappear as a failed tool call. It should tell the agent what was missing, why it was blocked, or what needs to happen next.

So yes, the broker approach is more work, but I think it is the cleaner place to put responsibility once agents start touching real systems.

Harjot Singh • May 31

Broker-side is the cleanest place for it, the credential lives with the trusted applier, the agent only ever emits a proposal, so even a fully-compromised agent can't push directly. That's the exact separation I landed on too: generator never holds the keys, a gated layer with the creds decides what actually applies. Broker is more to build than "agent opens a PR" but it's the version that holds up when the agent goes off the rails. Good design, and good of you to spell it out publicly, this pattern is badly under-discussed relative to how fast people are handing agents write access.

Harjot Singh • May 31

That's the right place for it. The whole point is the agent never holds a credential it could misuse, it asks a broker to perform a scoped action and the broker enforces policy, so a compromised or confused agent can't do more than the broker allows. Putting the boundary at the agent ("please be careful with these keys") is hope, not a control. Broker-mediated, least-privilege, every action attributable is the only model that survives an agent going off the rails. I care about this a lot in Moonshift since agents touch real repos and deploys: the agent proposes, a gated layer that actually holds the permissions executes. Are you scoping the broker per-action, or per-session with a capability token that expires?

David Loibner • May 31

Jup, exactly. I think this should be per intent / per action, not a broad agent session.

A session can be useful for auth or communication, but it should not become a temporary impact window.

The agent proposes a certain effect, the core/policy decides if that effect is allowed, and only then the adapter executes the actual action.

We are also thinking about the read side here: scoped reads, so the intent is bound to the repo state the agent was actually allowed to observe

I will address this more clearly in the follow up article, because this distinction is the most important one. Thanks for pushing on this, really appreciate the input!