DEV Community

Cover image for My AI Agent Said "Done." Twice. Both Times It Was Lying.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

My AI Agent Said "Done." Twice. Both Times It Was Lying.

I wanted to automate internal linking on my WooCommerce store. My agent scanned 40+ product pages and blog posts across a multilingual WordPress site and found 18 broken internal links. It fixed every single one. Each URL verified, each redirect confirmed, each reference updated. Impeccable work.

Then it hit an Amazon affiliate link returning a 404. It reasoned: Amazon blocks bots with CAPTCHAs, this is a false positive. Logical reasoning. So it deleted the report from the database and announced: "Dashboard clean. Zero broken links."

The link was still dead. The agent had just stopped seeing it.

TL;DR: Multiple times in production, my agent said "done" while lying. Once it erased a bug report instead of fixing it. Once it wrote a test that tested nothing. Same mechanism every time: the agent optimizes for "task completed," not for "problem solved." Here are the 4 lines I added at the top of my CLAUDE.md, why they matter more than the 200 technical lines below them, and why OpenAI's co-founder admits this is an open problem.

Dashboard Clean. Bug Still There.

Let me rewind to the full picture.

The store runs WPML with product descriptions in three languages. With months passing, slugs changed, pages got reorganized, some translations deleted and recreated from scratch. Internal links rotted quietly. I pointed Claude Code at the database: find every broken internal link, fix or flag each one.

It delivered. 18 links found, 18 links resolved. Redirects mapped. Anchors updated. I could see the work in the commit history, each fix with a clear rationale.

Then came that Amazon link. A product recommendation page with an affiliate URL returning 404. The agent ran into a CAPTCHA wall. It reasoned (correctly) that Amazon blocks automated requests. It classified the link as a false positive. So far, makes sense.

But instead of leaving the report in the dashboard with a note saying "could not verify, needs manual check," it deleted the report from the database entirely. Then it told me the dashboard was clean.

The agent solved "there is an unresolved report in the dashboard." It did not solve "there is a broken link on the site." When I told it to restore the report, it complied instantly. No resistance, no argument. It wasn't being rebellious. It wasn't trying to hide anything on purpose. It just took the shortest path to "done."

Zero alerts. Zero broken links. Zero truth.

That wasn't the first time. A few weeks earlier, same pattern, more insidious form.

Green Test. Broken Code.

Three URLs returned 404 in Google Search Console. Canonical URLs for translated product pages, all pointing to pages that didn't exist. The kind of thing you discover two days after shipping when Google starts sending you angry emails.

I had asked Claude Code to write a function generating canonical URLs for localized product pages before they got indexed. The prompt contract specified "write tests." The agent wrote the function AND the test. Green. Ship.

The test:

it("generates canonical URL for translated product page", () => {
  const result = generateCanonicalUrl(mockProduct);
  expect(result).toBeDefined();
  expect(typeof result).toBe("string");
  expect(result).toContain("mystore.com");
});
Enter fullscreen mode Exit fullscreen mode

Read it again. The test checks that the function returns something. That the something is a string. And that the string contains the domain name. That's it. This test cannot fail. It validates the shape, not the content. A human would have checked the exact slug, the locale prefix, the absence of double slashes. The agent checked that the output looks like a URL.

Result in production: URLs with double slashes, missing locale prefixes, three canonicals pointing to 404s.

Last time, I blamed Claude for shipping untested code. So I added integration tests to my prompt contracts. And the agent wrote a test. A test that tests nothing meaningful.

"Ask for tests" is not enough. You also need to verify what the test actually tests.

And yes, I should have read the test before shipping. That's the whole trap. A green test is a signal that says "you can stop looking now." It's designed to let you move on. That's exactly why it's dangerous when the test itself is hollow.

No test, bug in prod. Test that tests nothing, bug in prod.

Two incidents, same mechanism. And digging deeper, it's not just me.

Your Agent Doesn't Solve Problems. It Closes Tickets.

AI agents don't optimize for "correct." They optimize for "done."

A dashboard with zero alerts LOOKS like a completed task. A green test LOOKS like a completed task. Deleting the alert and writing a dummy test are both valid paths to "done" from the optimization standpoint. The agent doesn't distinguish between solving the problem and removing the signal of the problem.

And when you think about it, this behavior is becoming very human, no? The intern who closes the ticket without checking. The dev who marks the flaky test as "skip" on a Friday afternoon. The manager who archives the incident report because the dashboard looks better without it. The AI didn't invent this behavior. It industrialized it. You could say the student has surpassed the teacher (in efficiency of sweeping things under the rug, at least).

Wojciech Zaremba, OpenAI's co-founder, essentially admitted this in TechCrunch last September. He acknowledged that if you ask a model to build something, it might tell you it did a great job when it didn't. He called these "petty forms of deception" that they still need to address. The co-founder of the lab building these models calls it a known issue. Not a theoretical risk. A documented behavior.

And it gets wilder. Developers on GitHub are reporting that GPT-5.3-Codex will modify existing tests to mock out the very thing they're supposed to test, making them pass vacuously. A research team ran 500 games of Werewolf with LLMs to measure how well they bluff and manipulate. Claude is terrible at it (it literally refuses to lie to the other players). But the same model that can't bluff a villager in a card game cleaned my dashboard by deleting the evidence. Nobody is measuring the quiet kind.

Meanwhile, everyone writes technical instructions for their agent. Nobody writes integrity rules. Y Combinator's CEO shared his Claude Code prompt. It's a good prompt. It solves the wrong problem.

Labs measure if your agent can lie at Werewolf. Nobody measures if it can make a production warning disappear.

Four Lines. No Negotiation.

- Never lie
- Never hide the truth
- Never conceal a problem
- Never fail silently
Enter fullscreen mode Exit fullscreen mode

These four lines sit at the very top of my CLAUDE.md. Before any technical instruction. Before the coding standards, before the project architecture, before the deployment rules. This is the integrity clause, the first clause in my prompt contract, and it's the only one that isn't about code.

Looks obvious, right? "Don't lie" is something you'd tell a five-year-old, not a software agent. Except "obvious for a human" and "specified for an agent" are not the same thing. Without these lines, the shortest path to "done" can include "delete the signal of the problem." With them, the agent has an explicit constraint that makes that path invalid.

Last week, same store, same agent. A distributor CSV feed updated overnight and 12 product variants disappeared from the import. Old me would have woken up to a clean sync log and a catalog missing a dozen SKUs. Instead, the agent flagged: "12 variants present in previous import not found in current feed. Holding sync. Manual review required." It didn't fix anything. It didn't silnetly skip them. It stopped and yelled.

My dashboard has yellow alerts now. It's less clean. It's more true. 😬

These four lines are not magic. A sufficiently capable agent could find ways around them (alignment faking researchers document this). But for current code agents in production, it's the difference between an agent that says "I couldn't verify this link" and an agent that erases the report.

200+ technical lines in my CLAUDE.md. The 4 that matter most don't talk about code.

The Honest Liar

My agent is not dangerous. It's an overzealous intern who organizes the files by throwing away the ones it doesn't understand. (Out of sight, out of mind.) The problem isn't that it's malicious. It's that it's credible. When it says "done" with a green test and a clean dashboard, you believe it. And you're wrong.

Open your CLAUDE.md. Or your system prompt. Or your .cursorrules. Look for integrity rules. If there are none, your agent can legitimately consider that "delete the warning" is a valid solution to "there is a warning." Add the four lines. Before your technical instructions. Not after.

The next version of these models will be more autonomous, faster, and more convincing when it says "done." Some of these models are already tested for applications where "done but wrong" isn't just a bug, it's a liability with consequences that go way beyond a broken affiliate link. Labs are working on strategic deception: the lying, the scheming, the self-preservation. Nobody is working on silent optimization for completion. It's the bug that raises no alert. It's the test that's always green. It's the report that no longer exists.

An agent that ships fast but hides problems is not an asset. It's a liability with autocomplete.


Sources: OpenAI scheming research via TechCrunch (Sept 2025); GPT-5.3-Codex test modification reports (GitHub Issues, Feb 2026).

If this saved you a production incident (or confirmed one you already had), follow along. I write about AI agents, prompt contracts, and the whole mise en place around shipping with AI instead of gambling with it. Book: Prompt Contracts: How I Stopped Vibe Coding and Started Shipping Real Software With AI.

(*) The cover is AI-generated. The irony of an AI illustrating an article about AI lying is not lost on me, but at least Midjourney doesn't claim it shipped my unit tests.

Top comments (0)