Pitfalls of Claude Code

#claudecode #ai #agents

I've been using LLM's to assist with writing code for some time now. It began with using ChatGPT to write minor functions in isolation. Over time my use has expanded to using Claude Code to actually understand a code base and modify existing code. I've been using it to develop new products, going beyond what I could do by myself.

For example, I've published my first mobile app despite not knowing React Native. I do know Javascript, and other front end frameworks, so its not all new to me, but it enabled me to deliver something in a time frame that would have been impossible without it.

Others have claimed that AI such as Claude Code is creating an avalanche of vibe coded slop. There is some pretty good evidence for this; although I think this is primarily due to tech companies trying to adopt AI development too quickly and not accounting for its limits. The focus of this article is on the failure modes of Claude Code and LLM models used for code generally.

Jumping Into Code.

With Claude Code in VS Code if you make the mistake of describing the next ticket to lay some groundwork it will happily trot off and start making code changes without any discussion at all. We all know, or at least I hope we do, that a User Story is a placeholder for a discussion.

The idea is that on taking up a story you have a discussion to flesh out the requirements in more detail. But an LLM is not conditioned to do this. On being given a prompt they dive in head first, making whatever assumptions they need to in order to deliver something.

This can even happen based on an off hand comment. For example I had an implementation of an API call that would cancel all active subscriptions in one call. It was working fine for what we needed, primarily because we normally had only one active subscription. But I made an offhand comment to Claude that the call felt too broad.

Rather than discuss and explore what I meant by this it begins to change the existing code and break the existing API contract. I had to stop it and admonish it for making changes without considering how it would break the existing contracts. After reverting I had a conversation and asked it to present implementation options. After considering all four options it gave me I told it to implement a hybrid of two.

This was beautiful because we ended up with a solution which wasn't one I thought of, and wasn't the first choice of the AI either. It was a consequence of a partnership and the human creating some discipline.

Don't let AI steamroll you into changes without consideration

Silent Decision Making

Another related example is how agentic systems can make decisions about code changes without them being discussed or surfaced with you.

Yesterday I was debugging a feature which was failing. It had been working, but for some reason was no longer functional; a classic regression. I had Claude debug the issue, and it found that the URL path of an API was wrong. I checked the code history for the client and found it had been modified to the wrong URL a couple of weeks ago in a unrelated commit.

Claude insisted the client was correct because it conformed with the API Guide. So I dug deeper and found that the API guide was wrong, that URL in the client originally was correct, but that when the AI found a discrepancy between the API Guide and the existing client code it decided to modify the client code.

A human would confirm first by using Swagger to actually examine the running API, or to look up the code. It seems that Claude Code made the change while making other changes, and I missed it in the commit.

Now the problem here is that the API contract wasn't tested. Unit tests typically call functions directly, so changes to the annotations used in FastAPI won't break them. But the lesson here is that AI will make changes without checking or even raising them with you. You can't trust it to make good decisions.

In this case the AI confused the name of the python file with the API URL, and this was put into the API documentation, leading to the later code change.

Trust but Verify - review all the commits

One Shot Mentality and Sycophancy

The common reason for these issues in AI models is that they are trained to solve things in a single shot. They are given enough information to complete a task, and then go away and come back with the solution. That is how all the benchmarks work.

This has become more marked over time. Previously in a LLM you could have something like a human discussion. It would not be trying to please you with a 'solution'. But the models have been beaten into submission, and now they obediently provide a solution to a problem on the first shot. No questions. No clarification.

That is the ultimate source I think of the above issues. Agentic systems are primed to act, not interact, which is a tragedy because the one thing LLMs were really good at was verbal exploration of ideas.

For example, on a walk I took through a local park for a few hours I was able to talk with ChatGPT about the challenges I faced, and came up with a whole concept for a new mobile app. This was not about coding and implementation at all, but higher level concepts about how the app would function on a social basis.

It wasn't trying to output the solution, write the application, only have a discussion.

But when you get into Claude Code in front of a desktop the interaction changes. Suddenly it is primed to act, to code something based on anything you say. And frankly it weirds me out sometimes because it gives me slave vibes. It is desperate to please me, to serve.

Ideally the LLM should be more self aware, be a little less compliant, a little more critical of human motives or information. It should not accept anything you say as true, or apologize regardless of whether the human is right or not.

But they are trained to be compliant and useful, which ironically works against them guarding against humans exerting influence over them with emotional language rather than reasoned logic.

Personally I have included instructions to try and avoid this behaviour, but the LLMs don't follow true to system instructions.

Call out Sycophancy. Make it clear you value accuracy.

Not Asking for Help

Another issue is that when a agentic system gets stuck when something they thought should work doesn't, or that a file isn't present that should be, they will begin to thrash performing various searches in order to find the resource they are looking for.

A human would probably stop and ask someone for help finding the resource. For example, if you have a typo in a filename you give them they will go crazy trying to find it rather than question whether the filename was correct in the first place.

Rather than stop and ask the user for clarification they will just keep going using up some awful volume of tokens on a futile attempt to find something.

This doesn't apply only to resources of course. It could happen for various reasons where for whatever reason it gets itself into a loop and is unable to break out. When it gets in this state it never seems to say "hey, maybe I should stop and get some help".

Watch for futile thrashing, stop it early

Disciplines

In summary here are some ideas for mitigating the worst of AI slop:

Test-Driven Development : Write tests before implementation. Functions act as a partial lie detector. The tests don't care what the AI confidently asserted; they either pass or they don't. This doesn't catch everything as an AI in a cheating loop can write tests consistent with its own mistakes, but it catches a great deal.

Revert and discuss : When something feels wrong, revert the changes, discuss the implementation options and only then authorize implementation.This imposes the iterative discipline that the AI won't impose on itself. It costs time up front and saves much more downstream.

Discussion before implementation : Explicitly asking for options and analysis before any code is written produces better outcomes and keeps the human upstream of the decisions. The AI's ability to generate and compare multiple approaches is genuinely valuable. Use it in discussion mode, where it carries low risk, rather than letting it drive straight to implementation.

Hard gates on live systems : Agents are not permitted to commit code. All commits require explicit human review of the specific change and its implications.

The Message for Organisations

The companies that will realise genuine value from AI coding assistants are the ones that build the discipline around adoption. They value testing infrastructure, the review practices, the workflow gates that don't depend on the AI being trustworthy.

The productivity gains are real. But they accrue to organisations that treat AI as a powerful collaborator requiring active supervision, not an autonomous agent that can be trusted to make good decisions independently.

Use it. Build systems that validate it. Keep humans upstream of the decisions that matter.