Paulo Victor Leite Lima Gomes

Posted on Jun 10

coding agents made repositories the security boundary

#ai #security #github #codingagents

GitHub shipped a small changelog entry this week that says more about the future of coding agents than most of the launch demos.

Security validation for third-party coding agents is now generally available. Not just for GitHub's own Copilot cloud agent. For third-party agents too, including Claude and OpenAI Codex.

The feature sounds boring in the best possible way.

When an agent creates code, GitHub can run CodeQL, check new dependencies against the GitHub Advisory Database, and use secret scanning to detect tokens, API keys, and other sensitive material. If it finds a problem, the agent tries to fix it.

That is not the flashy part of agentic coding.

It is the important part.

Because once agents are allowed to act inside repos, the question stops being "which model wrote this diff?" and becomes "can the repository apply the same policy to every automation actor?"

authorship is the wrong abstraction

We still talk about generated code as if authorship is the primary thing that matters.

Was this written by Copilot? Claude? Codex? A human with tab completion? A human who pasted something from a chat window and cleaned it up? A junior engineer following a Stack Overflow answer from 2018?

Those distinctions matter for procurement and product marketing. They matter less for the repository.

The repository has a simpler problem: a change is trying to enter the system. It may introduce a vulnerability, add a risky dependency, leak a secret, violate an internal rule, or be perfectly fine.

That is why the GitHub change is interesting. It moves the useful boundary from "our approved coding assistant" to "any coding agent operating in this repository."

the agent is now an actor

For years, repository automation was mostly boring and legible.

CI ran tests. Dependabot opened dependency updates. Release bots bumped versions. Linters complained. Security scanners commented. Humans reviewed. The automation could be annoying, but its shape was predictable.

Coding agents are different.

They do not merely report on the repository. They edit it.

They read issues, inspect files, modify code, add tests, update dependencies, and open pull requests. That makes them closer to a contractor than a linter.

And like every contractor with write access, they need boundaries.

Not vibes. Boundaries.

What tools can they call? What data can they read? Which secrets are visible? Which checks must pass? Who approves the final merge? How do we audit what happened later?

These are not philosophical questions. They are repository administration questions.

The funny part is that this makes agentic coding feel less futuristic and more like boring enterprise software. Identity, authorization, audit logs, policy inheritance, defaults, exceptions, and enforcement.

As usual, the boring part is where production starts.

security checks should not depend on vendor loyalty

The worst version of agent governance is vendor-specific policy.

Copilot-generated code gets one security path. Claude-generated code gets another. Codex-generated code depends on whatever the team remembered to configure. A local agent running from a developer machine is treated as a human diff because nobody knows what else to do with it.

That does not scale. It also creates the wrong incentive: teams argue about which agent is safe instead of making the repository resilient to generated work in general.

The security posture should not be:

"We trust this model."

It should be:

"We do not allow unvalidated changes through this boundary."

Those are very different statements.

The first one turns model selection into a security control. That is weak, because models change, vendors change defaults, prompts drift, and generated output is still output.

The second statement is boring and robust. Every change gets checked. Every actor hits the same gates. The repo enforces policy regardless of whether the diff came from a human, a first-party agent, or a third-party one.

That is the direction this needs to go.

automatic remediation is useful, but weird

The part that made me pause is not that GitHub runs security analysis.

It is that, when a problem is found, the agent attempts to resolve it before finalizing the pull request.

That is useful. It also changes the review loop.

Traditionally, a scanner reports an issue and a human fixes it. With agents, the loop can become: tool complains, agent modifies code, tool checks again, agent modifies code again, then the final diff appears for review.

That is probably right. It is also a reason to be more disciplined about logs and provenance.

If an agent introduced a dependency, the scanner flagged it, and the agent replaced it with another one, the final pull request may only show the end state. The interesting part might be the path it took to get there.

For small bugs, nobody will care.

For security-sensitive changes, regulated environments, and expensive incidents, people will care a lot.

This is where "agent wrote code" becomes too vague. We need to know which agent acted, under which permissions, which tools ran, and which checks were authoritative.

That sounds like paperwork until the first incident review.

Then it sounds like the minimum viable truth.

this is a platform-team problem

One pattern keeps repeating across the agent tooling news: the useful features are becoming platform features.

Security validation. MCP server configuration. Repository-level review settings. Agent skills. Cost controls. Sandboxes. Audit trails. Hooks before and after tool use.

These are not features an individual developer should be hand-rolling for every repo.

If a company has hundreds of repositories, the platform team needs to answer boring questions. Which agents are allowed where? Which validations are mandatory? How are secrets isolated from agent execution? Can the company prove that the same rules apply across repos? Who owns the policy when it breaks a team at 5 PM on Friday?

None of this looks like the AI demo, but it decides whether agent adoption becomes productive infrastructure or a pile of clever one-off workflows.

the repository is becoming the enforcement layer

I like this GitHub change because it points to the right mental model.

The agent is not the trust boundary.

The repository is.

That does not mean the agent can be reckless. Agent identity, sandboxing, tool permissions, and prompt controls still matter. But the repository is where work becomes part of the system. It is where code review, CI, security scanning, branch protection, dependency policy, and audit history already meet.

So it makes sense for agent governance to land there too.

This is also a good reminder that "AI-native development" will not replace the old software delivery machinery as quickly as people think. It will mostly increase the load on that machinery: more pull requests, more generated dependencies, and more plausible mistakes arriving faster than reviewers can comfortably process them.

The answer is not to make every human reviewer faster by sheer force of will.

The answer is to move the obvious checks closer to the boundary and make them consistent.

what i would do now

If I were responsible for agent adoption in a real engineering organization, I would start with the repository policies before buying more seats or enabling more tools.

Inventory which automation actors can open pull requests today. Include the boring ones: dependency bots, release bots, internal scripts, CI workflows, and any agent experiments running from developer machines.

Then make the entry rules explicit.

At minimum, I would want mandatory security scanning, dependency checks, secret scanning, branch protections, and a clear distinction between "agent may propose" and "agent may merge." I would also want logs that make agent actions reconstructable later.

I would avoid making this a model-ranking exercise.

The question is not whether Claude, Codex, Copilot, or the next thing has the best benchmark score this month. The question is whether your repository can safely accept work from any of them under a policy you understand.

That is the difference between a demo and an engineering system.

the punchline

The interesting thing about third-party coding agents is not that they can write code.

We know they can write code.

The interesting thing is that they are becoming normal actors inside software delivery. Once that happens, the repository has to become stricter, not looser. The more autonomous the actor, the more boring the boundary needs to be.

GitHub extending security validation beyond its own agent has a large implication: agent governance cannot be a product-specific afterthought. It has to be a repository property.

That is the right direction.

Do not trust the diff because you like the model.

Trust the diff because it survived the same boundary every other change has to survive.

references

To test my projects, I use Railway. If you want $20 USD to get started, use this link.

DEV Community