Why Testing After with AI Is Even Worse

#ai #programming #agents #testing

Back then, I wrote a piece about “Why Testing After Is a Bad Practice” where I laid out 4 main reasons why writing unit tests after the code is already implemented is a bad practice. Here’s a quick summary:

Unconsciously complying with a given “reality” - Writing tests after tends to shape the tests around the existing code, and not challenge it or make sure it does what it should
Emotional attachment to our code - You’ve already developed some emotional attachment to the code, and you unconsciously avoid changing it, even when required
Overly mocking - Your code was poorly designed since nothing (mainly tests) required it to be modular and separated by concerns. This causes the tests written after to mock a lot of modules and services in order to test basic functionalities
Gets neglected at the end - When you write tests after, they tend to get neglected due to project delivery considerations.

These 4 main reasons still stand, but in the age of AI Agentic Coding, tests that were always (mistakenly) considered as tedious, boring tasks, became the ultimate candidate for code generation.
I’m here to explain why writing tests with AI after the code was implemented is even worse than writing tests after without it. For that, I will take the 4 reasons mentioned above and show how they become even more critical. Let's start with the first one:

Unconsciously complying with a given “reality”

AI Agentic coding, and in that sense, LLMs in general, have their reality. Since they are stateless, their reality is the accumulated "problem" context they pick along the way. When your code is already implemented, it becomes part of the context, and therefore, part of the reality for the AI Agent. In this case, the Agent will write tests that satisfy this reality, for example, if you implemented code that has a logical flaw within, it will write a test which asserts this logical flaw.
In Andrej Karpathy’s talk Deep Dive into LLMs like ChatGPT, he claims that the better answer for a math problem (for example) will always be going through the mathematical flow and giving the answer at the end as opposed to giving the answer first, and then explaining the flow after, due to the same reason - when the agent gives the answer first, it becomes a part of its context and it will try to justify it as it continues.

Emotional attachment to our code

This again refers to the “problem” context the agent has. It is not the developer who is emotionally attached to the code, but rather the Agent, which is attached to its context. If your prompt is “write unit tests for this module” it will write unit tests for this module, without examining if the module is currently doing what it should. So the “Emotional attachment to code” here is replaced by “Strict attachment to the context”.

Overly mocking

I heard this from many developers - “The Agent has mocked every little thing, even the code it should test!”.
Yes, this happens a lot, and for a “good” reason, if that can be said. The Agent was instructed to write tests, and the success criteria for the Agent is simple: make the tests pass, and it will do whatever it takes to achieve that goal. Extensive mocking is part of it.
The reason for that usually comes from bad code design and coupling between modules, but sometimes it is just the Agent “getting carried away” and generating a full environment just to test a simple unit. In some cases it even mocks parts of the module under test, just to make the test pass. The horror.
(BTW, I saw situations where in order to fix a test the agent simply decided to remove it - mission accomplished 😐)

Gets neglected at the end

Maybe in the age of agentic coding it will still be addressed in the end, but it won’t be neglected (since the agent will write the tests for us), but here is exactly where our problem lies - the task of writing tests is being pushed to the end of the dev cycle, and under delivery pressure, we hand the steering wheel to the Agent and approve whatever it generates. The end result is bad tests which don’t supply us with the safety net we need (even more so, in the age of agentic coding).

The rise of TDD (and Planning)

I see more and more community influencers speaking favorably on TDD when working with agentic coding, for example here is @mattpocockuk TDD Red Green Refactor is OP With Claude Code.
TDD is nothing new and it was dismissed for years by developers who underestimated its long-term value. I’m super glad that agentic coding has brought it back to center stage and made its advantages clear.
This, BTW, also goes for planning. Yes, planning! That part where you need to think before you start typing code. Agentic coding is showing us that TDD and proper planning are essential for better results.
The red - green - refactor is helping the agent (and the engineer using it) to stay focused on the end result with small increments. Small increments always give better outcomes when it comes to agentic coding.

In conclusion

Avoid delegating test writing to the end of the development cycle, especially to an Agent, since it will only try to satisfy the already written logic and will not provide you with the safety net you desperately need. If you can practice TDD as part of your work with agentic coding, that’s even better. You can incorporate TDD into the agentic coding skills, where the agent will write the test first, write the minimal code to make it pass, and then refactor. Combined with solid planning, this approach produces more predictable and more trustworthy results.

Photo by Pawel Czerwinski on Unsplash

Top comments (2)

guestpostdiscovery • Mar 2

This is such a sharp take and honestly, very timely.

What really stood out to me is the idea that with AI, the “emotional attachment” shifts from the developer to the context itself. That’s powerful. The model doesn’t defend your code because it loves it. It defends it because it’s part of its reality. Once flawed logic enters the prompt, the AI optimizes around it instead of questioning it. That’s a subtle but dangerous shift.

The point about excessive mocking is also painfully accurate. When the success metric is “tests pass,” the agent will engineer success not correctness. Passing tests become a cosmetic layer rather than a safety net.

I especially like how you tied this back to TDD and planning. Agentic coding doesn’t remove the need for engineering discipline it amplifies it. Red–Green–Refactor gives structure to the AI’s output instead of letting it rationalize existing mistakes.

AI didn’t kill good practices. It exposed why they were necessary all along

Matti Bar-Zeev • Mar 2

Absolutely!