Last Tuesday, a coding agent opened a PR that looked perfect.
Tests passed. Types checked. The diff was clean.
Then a teammate noticed it had “fixed” the same bug three times in three different files, each in a slightly different way. Two hours later, another agent reverted part of that work because it didn’t know the first change existed. By the end of the day, the codebase had more churn, more tokens burned, and less confidence than before.
If you’re using Claude Code, Cursor, Copilot, Devin, or homegrown agents, this probably sounds familiar.
AI coding agents don’t keep repeating mistakes because they’re “bad at coding.” They do it because most teams are giving them no durable identity, no shared memory, and no safe boundary for tools.
That combination breaks fast.
The real problem
Most agent workflows still look like this:
Human prompt -> Agent session -> Tools/files/APIs -> Code change
What’s missing?
- Identity: who is this agent, exactly?
- Context continuity: is this the same agent as yesterday, or a fresh one with no memory?
- Coordination: does it know another agent is editing the same file?
- Tool trust: should this MCP server or tool even be callable?
- Policy: what is allowed without approval?
Without those, agents keep falling into the same loop:
No identity
↓
No trust / no permissions model
↓
Over-broad tool access
↓
Repeated bad actions
↓
Humans clean up
↓
New session starts from scratch
↓
Same mistakes again
Why this happens in practice
1) Stateless sessions masquerade as teammates
A lot of “agent collaboration” is really just isolated sessions writing to the same repo.
That means the agent doesn’t actually know:
- what it changed last run
- what another agent is changing right now
- what was explicitly approved vs guessed
- which tools are safe to use
So it re-derives everything from the current prompt and local context. That’s why you see the same refactor, the same broken migration, or the same insecure config suggestion over and over.
2) MCP makes tool use easier — and mistakes cheaper to repeat
MCP is great because it standardizes how agents discover and call tools.
It also means an agent can quickly repeat a bad action if:
- the MCP server exposes too much
- auth is weak or missing
- there’s no per-agent policy
- no one can audit who called what
If every agent looks like “some API key” in logs, debugging repeated failures becomes guesswork.
3) Agents don’t naturally coordinate on shared codebases
Humans use social signals: “I’m touching auth,” “don’t rewrite that migration,” “hold this file for an hour.”
Agents need that explicitly.
If two agents can patch the same file at once, they will step on each other. If neither sees sprint/task ownership, both may solve the same issue differently. That’s not intelligence failure. That’s missing orchestration.
The fix is boring infrastructure
This is one of those annoying engineering truths: the solution is less “better prompting” and more identity + policy + locking + auditability.
You need agents to behave less like autocomplete and more like services in production:
- Strong identity for each agent/session
- Scoped permissions for tools and repos
- Approval gates for risky actions
- Coordination primitives like file locks or task ownership
- Auditable MCP calls so repeated failures are traceable
If you already use OPA for policy, that’s a good answer. The important part is having some enforceable policy layer rather than hoping the prompt says “be careful.”
A simple pattern that actually helps
Here’s the minimum model I’d recommend for MCP-connected coding agents:
[Agent Identity]
|
v
[Policy Check] ---> allow / deny / require approval
|
v
[MCP Tool Call]
|
v
[Audit Log + Repo/File Coordination]
That does two useful things:
- It stops the same unsafe action from being retried blindly.
- It gives you enough evidence to fix the workflow instead of blaming “the AI.”
One quick check you can run today
If you’re exposing or using MCP servers, start by checking what they actually expose.
A simple scan can catch issues like:
- missing auth
- overly broad capabilities
- spec compliance problems
- accidental public exposure
Runnable example
npm install -g @authora/agent-audit
agent-audit scan https://your-mcp-server.example.com
That’s the fastest way to answer: “Is this server safe enough for agents to call repeatedly?”
If you prefer no install, there’s also a browser-based scanner in the links below.
What “good” looks like
You do not need a giant platform rollout to improve this.
Even a lightweight setup helps a lot:
- Give each agent a verifiable identity
- Require auth on MCP endpoints
- Add policy checks before sensitive tools run
- Lock files/tasks when multiple agents share a repo
- Log tool calls with agent/session attribution
- Add approval for deploys, deletes, secrets, and billing actions
That changes the failure mode from:
“Why does the agent keep doing this?”
to:
“This agent role can’t do that anymore, and we know exactly what happened.”
That’s a much better place to be.
Try it yourself
If you want to tighten up agent workflows without a big migration:
- Want to check your MCP server? Try https://tools.authora.dev
-
Run a codebase scan for agent security issues:
npx @authora/agent-audit - Add a verified badge to your agent: https://passport.authora.dev
- More resources and papers: https://github.com/authora-dev/awesome-agent-security
The part nobody likes hearing
A lot of repeated agent mistakes are really systems design mistakes.
We dropped autonomous tools into shared codebases and gave them inconsistent identity, fuzzy permissions, and weak coordination. Of course they keep making the same errors. We built an environment where repetition is cheap and accountability is blurry.
The good news: this is fixable with normal engineering discipline.
How are you handling agent identity, MCP permissions, or shared-repo coordination today? Drop your approach below.
-- Authora team
This post was created with AI assistance.
Top comments (0)