Someone left an upvote, so there seems to be one listener. Dear listener, let me continue...
When someone is having a hard time getting great results, you are going to hear the "No true Scotsman" or appeal to purity.
The pundits will say that if I just had better tests, or better Project Specification Document (PRD) docs, or used the better brand Agentic Tool, or the other labs models, or just learnt to prompt-engineer better, I would have got better results.
Okay, well, on my last project, I used all of these US-based tools on a single client project:
- Claude Code CLI
- OpenAI Codex CLI
- GitHub Copilot CLI
- Cursor Agent CLI
- OpenCode CLI
- Copilot IDE
- OpenCode Desktop
- Codex Desktop
- Aider Chat
I like working with multiple models and some features of each tool. I have patched codex to add in both Claude and Gemini. I then added back the 'Ask' read-only mode, as I find it a good emergency feature when a model has gone a bit rogue.
You might think, "Gee, if the guy had just stuck to one tool to learn how to use it properly, maybe he could have got it to work!" At one point, I would burn through a $200 Max sub in the first week of the month. The new 5-hour token limits mean that to work a full day, I need two Max subs. That is why I needed all the tools to have enough subsidised Max subs to get through the month.
I now avoid the least reliable tool, Claude Code, until I have hit the weekly rate limits of the other tools. Yes, you read that correctly. I would rather use any of the other tools before Claude Code. Once again, not because I am unfamiliar with it. Because I have used it the most. I know from personal experience that I get better results when mixing Claude with other models across different tools. If you are not using Claude with GPT and others with something like Cursor Agent CLI or OpenCode, then you are missing out.
Surely you cannot prefer OpenAI Codex, I can hear you cry. Well, as I said, I have patched OpenAI Codex CLI, which is Apache-2.0 open-source, to run Claude Opus 4.6 (not 4.7!), and Gemini 3 Pro, next to GPT-5.4 (not 5.5!). Why? As you can spawn agents like tmux session, and go talk to them! I can have the Main agent pass work to named agents. Yes, Claude Code has subagents, as does OpenCode. Yet neither lets you switch to them or talk to them. Yet you can go back to asking the Main agent to delegate work to them! Remember, I patched in Claude, Gemini, and GPT into my build. So, I switch between them during an agent session to get them to code-review or pair-program.
My current preferred setup is to have the Main model with two or three subagents. The less reliable latest models can be the managing Main model as well, management isn't coding, is it? lolz. I don't let the latest Opus or GPT write code, as they are far too erratic. At my setup, they have been promoted out of the way to be product managers. Lolz.
No one can say that I am simply not experienced with this tech. On the contrary, I have not heard of anyone who has used as many tools as aggressively as I have to try to get things to peak performance.
I may have done a speed run to the end of the game. I have played the game again and again and again. I have played the game on extra hard, and I have played it for up to 36 hours straight. No, I am not joking. Yes, I am okay. I am a little ... unwilling to let the Agents fob me off, and sleep isn't really a thing when I am locked in.
The new models are worse. The rate limits are worse. The costs are going up. And it turns out, the models are not just not good enough. They didn't pass probation. I have a soft spot for one model. Why? It is the one who will gaslight less and who will follow instructions. Yet most of them are... at best overconfident and inexperienced. The polite way to say it is that they can be goldfish or caffeinated squirrels. The unvarnished truth is that they are occasionally useful edits, but downright dangerous idiots when not closely supervised.
That is enough for now. I need some sleep. I am due a catch-up.
End.
Top comments (0)