DEV Community

Cover image for Your AI coding agent gets worse as your codebase grows. Here's why.
r-via
r-via

Posted on

Your AI coding agent gets worse as your codebase grows. Here's why.

Most people don't notice their AI coding agent gets worse as their codebase grows.

Not because the model degrades. Because the context does.

The pattern

50 files: Claude Code or Cursor sees enough of the codebase to follow conventions, reuse utilities, avoid duplication. The output is coherent.

500 files: it can't. So it reimplements helpers that already exist three folders away. Introduces naming conventions that contradict the rest. Generates functions nobody will ever call. "Fixes" bugs by stacking workarounds on workarounds.

The model didn't get dumber. It just stopped being able to hold the whole project in its head.

The result: codebases that ship fast at first, then collapse under their own weight. Dead code. Hidden duplication. Best practices selectively applied. The exact opposite of what AI-assisted coding was supposed to give us.

Why nothing catches it

Linters catch syntax. Type checkers catch types. Test runners catch broken contracts. None of them catch the kind of rot AI-generated code produces, which is architectural rot.

"Is this function used anywhere?"
"Does this already exist somewhere else under a different name?"
"Is this the third time we've solved this problem three different ways?"
"Does this follow the conventions used in the rest of the codebase?"

These questions require understanding the whole project, not just one file. No traditional tool can answer them. And asking the same AI agent that wrote the code to also review it is asking the fox to guard the henhouse.

What I tried first

I tried prompting more. Bigger context windows. Stricter system prompts. "Please check if this function already exists before creating a new one."

It worked sometimes. It also failed silently most of the time, because the agent had no reliable way to actually verify its claims. So it just confidently asserted things were unique when they weren't.

I needed an intermediary. A separate quality layer that does one job: audit what the coding agent produces, with proof.

What I built

I built Anatoly, an open-source audit agent that walks through every file in your codebase and produces an evidence-backed review.

The core rule: every finding has to be proven before it's reported. If the agent claims a function is dead, a second agent has to prove it through a deliberation mode, using read-only tools (Grep, Glob, Read) to investigate the whole project and confirm the finding. No claim survives without evidence. No hallucinated findings.

Under the hood it uses tree-sitter for AST parsing, gives a Claude agent read-only tools (Glob, Grep, Read) to investigate, and runs a local semantic RAG index (Xenova embeddings + LanceDB) to catch cross-file duplication that grep can't see. Output is schema-validated with Zod, with a self-correction loop if the agent's JSON doesn't pass.

One command:

npx anatoly run
Enter fullscreen mode Exit fullscreen mode

Next step

The next step I'm working on: a remote audit workflow. Anatoly runs on a remote server while you sleep, posts a structured report directly on your GitHub repo (issues or PR comments), and gives you a clean list of findings to address one by one the next morning. No local cost, no waiting, no context switching. You wake up, you review, you fix.

Looking for repos

Anatoly is open-source under AGPL3. I'm currently looking for codebases to scan for free to refine the model and surface edge cases. If you've got a project you'd like audited, no strings attached, drop a comment or open an issue on the repo.
Repo: github.com/r-via/anatoly
How are you handling AI-generated code rot on your team? Curious if others are seeing the same pattern.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.