What kinds of repo-truth boundaries do AI coding agents miss?

#ai #productivity #devops #discuss

I’ve been running field tests against live open-source issues to study a specific question:

When AI coding agents make repairs, where do they lose the actual truth of the repo?

Not “does the AI write code?”

Not “can it pass a test?”

More specifically:

Can the agent identify the boundary that actually broke before it starts changing files?

So far, the strongest failure classes I’m seeing are things like:

provider/config contract drift
retrieval/source-context truth loss
dev/build behavior divergence
build artifact metadata losing source truth
generated output being treated as stronger evidence than it really is

One example: in an AI app, search results came back successfully, but the retrieved truth never survived into usable model context because a response-shape boundary collapsed.

Another example: in a build tool, the same plugin hook filter behaved differently in dev and build.

These are not just “bugs.” They are places where one part of the system says one thing, and another part silently treats it differently.

That is the area I’m trying to classify.

I’m building Scarab Diagnostic Suite around this idea: a repo-local diagnostic layer that identifies truth boundaries before an AI coding agent starts repairing.

I’m posting the field tests as a numbered series.

The question I’d love to hear from other developers:

When you’ve used AI coding agents, where have you seen them lose repo truth?

Examples: