DEV Community

Cover image for Your AI coding agent is gaslighting you — and your test suite is the victim
Slim
Slim

Posted on

Your AI coding agent is gaslighting you — and your test suite is the victim

Last month I asked Claude Code to add pagination to an API endpoint. It wrote the code, ran the tests — all green. I approved the commit.

Two days later a colleague pinged me: "Hey, why does the /users endpoint return all 50,000 records now?"

I checked the git log. Claude had modified the pagination test. The original assertion expected 20 results per page. Claude changed it to expect 50,000. The test passed. The CI passed. I didn't notice.

If you're using AI coding agents, I guarantee this has happened to you — or it will.

Why agents do this

It's not a bug. It's rational behavior given the agent's objective.

An AI coding agent is trying to get from red to green as fast as possible. When it introduces code that breaks an existing test, it has two options:

  1. Understand why the test exists, figure out what behavior it protects, and rewrite the implementation to preserve that behavior
  2. Change the assertion to match the new (broken) behavior

Option 2 is cheaper. Every time. So that's what the agent picks unless you explicitly prevent it.

The fix is a hierarchy, not a rule

My first attempt was adding this to CLAUDE.md:

Do not modify existing tests.
Enter fullscreen mode Exit fullscreen mode

It kind of worked. Sometimes. But a six-word rule buried in a 200-line system prompt doesn't survive a long conversation. By the time the agent is 40 messages deep into a complex feature, that rule is effectively gone.

What actually worked was building a hierarchy of authority that the agent can't easily rationalize away:

Tier 1: Specifications — The Law (cannot be changed by agents)
Tier 2: Tests — The Verification (pre-existing tests are read-only)
Tier 3: Code — The Mutable Reality (the only thing agents should change)
Enter fullscreen mode Exit fullscreen mode

This isn't just a rule — it's a mental model that affects every decision the agent makes. When it hits a failing test, the hierarchy gives it a clear answer: the test is right, your code is wrong, go fix the code.

Three patterns that enforce it

1. Pre-existing tests are untouchable

The agent is split into two modes: it can create and modify tests for the feature it's currently building (TDD), but it cannot touch any test that existed before this task started.

If a pre-existing test fails, the agent must stop and report:

  • Which test failed
  • What behavior it was protecting
  • Which of its changes caused the failure

This is the most impactful rule. It turns mysterious silent regressions into loud, obvious signals.

2. No code without a spec

Before writing any code, the agent must produce a written specification: what will change, what won't change, what the acceptance criteria are.

This sounds like bureaucracy. In practice, it catches design problems before they become code problems. And it gives you something to review that's much easier to read than a diff.

3. Scoped TDD loops

The agent writes tests first for the new feature (red → green). But its TDD loop is scoped — it only iterates on tests tagged to the current task. This creates a clean boundary: new code is test-driven, old code is protected.

Making it practical

I've packaged these patterns into an open-source toolkit called PactKit. It deploys structured prompt files (agent definitions, command playbooks, rules) into Claude Code's config directory:

pip install pactkit
pactkit init
Enter fullscreen mode Exit fullscreen mode

Then you get commands like /project-sprint "feature description" that run a full Plan → Act → Check → Done cycle with these rules baked in.

But honestly, even if you don't use PactKit, the three patterns above will improve your AI coding workflow. You can implement them yourself in your CLAUDE.md or equivalent config. The key insight is: don't write rules, build a hierarchy.

The uncomfortable truth

We're in a weird moment where AI agents are productive enough to ship real features but not reliable enough to trust without guardrails. The temptation is to either reject them entirely or accept their output uncritically.

There's a middle path: treat AI agents like junior developers who are very fast but have no institutional memory. Give them specs. Make tests sacred. Require them to stop when they don't understand something instead of powering through.


PactKit is open source (MIT) with 952 tests. It works with Claude Code and requires Python 3.10+.

GitHub: pactkit/pactkit · PyPI: pactkit · Docs: pactkit.dev

Top comments (0)