Authora Dev

Posted on Apr 11

Why AI coding agents keep making the same mistakes (and how to stop it)

#ai #programming #devops #security

Last Tuesday, a coding agent opened a PR that looked perfect.

Tests passed. Types checked. The diff was clean.

Then a teammate noticed it had “fixed” the same bug three times in three different files, each in a slightly different way. Two hours later, another agent reverted part of that work because it didn’t know the first change existed. By the end of the day, the codebase had more churn, more tokens burned, and less confidence than before.

If you’re using Claude Code, Cursor, Copilot, Devin, or homegrown agents, this probably sounds familiar.

AI coding agents don’t keep repeating mistakes because they’re “bad at coding.” They do it because most teams are giving them no durable identity, no shared memory, and no safe boundary for tools.

That combination breaks fast.

The real problem

Most agent workflows still look like this:

Human prompt -> Agent session -> Tools/files/APIs -> Code change

What’s missing?

Identity: who is this agent, exactly?
Context continuity: is this the same agent as yesterday, or a fresh one with no memory?
Coordination: does it know another agent is editing the same file?
Tool trust: should this MCP server or tool even be callable?
Policy: what is allowed without approval?

Without those, agents keep falling into the same loop:

No identity
   ↓
No trust / no permissions model
   ↓
Over-broad tool access
   ↓
Repeated bad actions
   ↓
Humans clean up
   ↓
New session starts from scratch
   ↓
Same mistakes again

Why this happens in practice

1) Stateless sessions masquerade as teammates

A lot of “agent collaboration” is really just isolated sessions writing to the same repo.

That means the agent doesn’t actually know:

what it changed last run
what another agent is changing right now
what was explicitly approved vs guessed
which tools are safe to use

So it re-derives everything from the current prompt and local context. That’s why you see the same refactor, the same broken migration, or the same insecure config suggestion over and over.

2) MCP makes tool use easier — and mistakes cheaper to repeat

MCP is great because it standardizes how agents discover and call tools.

It also means an agent can quickly repeat a bad action if:

the MCP server exposes too much
auth is weak or missing
there’s no per-agent policy
no one can audit who called what

If every agent looks like “some API key” in logs, debugging repeated failures becomes guesswork.

3) Agents don’t naturally coordinate on shared codebases

Humans use social signals: “I’m touching auth,” “don’t rewrite that migration,” “hold this file for an hour.”

Agents need that explicitly.

If two agents can patch the same file at once, they will step on each other. If neither sees sprint/task ownership, both may solve the same issue differently. That’s not intelligence failure. That’s missing orchestration.

The fix is boring infrastructure

This is one of those annoying engineering truths: the solution is less “better prompting” and more identity + policy + locking + auditability.

You need agents to behave less like autocomplete and more like services in production:

Strong identity for each agent/session
Scoped permissions for tools and repos
Approval gates for risky actions
Coordination primitives like file locks or task ownership
Auditable MCP calls so repeated failures are traceable

If you already use OPA for policy, that’s a good answer. The important part is having some enforceable policy layer rather than hoping the prompt says “be careful.”

A simple pattern that actually helps

Here’s the minimum model I’d recommend for MCP-connected coding agents:

[Agent Identity]
      |
      v
[Policy Check] ---> allow / deny / require approval
      |
      v
[MCP Tool Call]
      |
      v
[Audit Log + Repo/File Coordination]

That does two useful things:

It stops the same unsafe action from being retried blindly.
It gives you enough evidence to fix the workflow instead of blaming “the AI.”

One quick check you can run today

If you’re exposing or using MCP servers, start by checking what they actually expose.

A simple scan can catch issues like:

missing auth
overly broad capabilities
spec compliance problems
accidental public exposure

Runnable example

npm install -g @authora/agent-audit
agent-audit scan https://your-mcp-server.example.com

That’s the fastest way to answer: “Is this server safe enough for agents to call repeatedly?”

If you prefer no install, there’s also a browser-based scanner in the links below.

What “good” looks like

You do not need a giant platform rollout to improve this.

Even a lightweight setup helps a lot:

Give each agent a verifiable identity
Require auth on MCP endpoints
Add policy checks before sensitive tools run
Lock files/tasks when multiple agents share a repo
Log tool calls with agent/session attribution
Add approval for deploys, deletes, secrets, and billing actions

That changes the failure mode from:

“Why does the agent keep doing this?”

to:

“This agent role can’t do that anymore, and we know exactly what happened.”

That’s a much better place to be.

Try it yourself

If you want to tighten up agent workflows without a big migration:

Want to check your MCP server? Try https://tools.authora.dev
Run a codebase scan for agent security issues: npx @authora/agent-audit
Add a verified badge to your agent: https://passport.authora.dev
More resources and papers: https://github.com/authora-dev/awesome-agent-security

The part nobody likes hearing

A lot of repeated agent mistakes are really systems design mistakes.

We dropped autonomous tools into shared codebases and gave them inconsistent identity, fuzzy permissions, and weak coordination. Of course they keep making the same errors. We built an environment where repetition is cheap and accountability is blurry.

The good news: this is fixable with normal engineering discipline.

How are you handling agent identity, MCP permissions, or shared-repo coordination today? Drop your approach below.

-- Authora team

This post was created with AI assistance.

DEV Community