Discussion on: I've organised the Claude Code commands, including some hidden ones.

View post

Replies for: the /insights command is genuinely underrated. i ran it after a month of use and realized i was manually doing things that could have been custom c...

@nedcodes
Hi Ned C! Thanks so much for the comment!!

I've been really overwhelmed by the amount of Anthropic's official docs and all the recent info. There are still so many commands I didn't know about. It's a good reminder to go back to the basics, read the primary sources carefully, and use them more often... yeah, I feel that lol.

I'm glad the comparison table was helpful! That makes me really happy!

Claudecode is excellent, but the costs are definitely something to watch out for, right...!

Using Opus as the commander and Sonnet for workers is a great tip, but I'm still a bit worried about relying only on Claudecode. So I've been experimenting with using OpenAI's CodeX for the implementation parts. It might work well as a hybrid approach. (The downside is my AI subscriptions keep piling up... haha)

Ned C • Feb 14

the hybrid approach with CodeX for implementation is interesting. do you treat each tool as its own isolated step, or is there some context handoff between them?

and yeah, the subscription creep is real. i started tracking monthly AI spend separately just to stay honest with myself about it.

灯里/iku • Feb 15

@nedcodes
Great question! It's not fully automated yet, but it's more than just isolated steps.

Here's what I've been experimenting with:

Claude plans, Codex implements
I use Claude (Opus) for high-level design and planning, then hand that off to Codex for implementation. Practically, I run both side by side using tmux — split panes in VS Code's terminal, Claude Code on the left, Codex CLI on the right, same project directory. So context lives in the shared codebase and plan files, no manual copy-pasting needed.
Parallel runs for cross-review
Same tmux setup — Claude Code as the main driver (e.g., large refactoring) and Codex as a second opinion running in parallel. They catch different kinds of bugs, which is the whole point.
Orchestration via Agent SDK (exploring)
The dream is using Agents SDK to orchestrate Claude as the planner and Codex as the coder automatically, with context passed via SDK. Still early stage for me, but the potential is exciting.

Honestly, part of it is also a philosophical thing — I don't want to depend too heavily on a single company's model. Same as in the real world, right? One person can't solve everything. Getting a "second perspective" from a different model catches blind spots. It's like applying "the right person for the right job," but for AI models lol.

Personality-wise too, Claude is more chatty and has great vibes for conversation, while Codex is more of a serious worker type. Both have their strengths!

And yeah, the subscription creep is painful... tracking it separately is smart, I should do that too haha.

Here's a Medium article that covers the Claude Code vs Codex comparison well if you're interested:
blog.ivan.digital/claude-code-vs-o...

Ned C • Feb 15

the tmux split-pane setup is practical. i've been thinking about similar workflows where you keep both agents in the same project directory and let the shared codebase be the communication layer instead of trying to pipe context between them programmatically. the Agents SDK orchestration angle is worth exploring, especially if you can define clear boundaries for what each model handles. curious whether you've hit cases where Claude and Codex disagree on approach and how you resolve that. also good call on not depending on a single provider, it's something i think about more now.

灯里/iku • Feb 15

@nedcodes
thanks for the thoughtful comment! the "shared codebase as communication layer" framing is exactly how i think about it too. no fancy piping, just let the filesystem be the interface.

on the Claude vs Codex disagreement question, great timing, i've been meaning to write this up lol

honestly, it's less about "who's right" and more about "whose perspective fills the gap." the models reflect their makers' philosophies more than you'd expect.

here's what i've noticed so far (still experimental, grain of salt etc):

top-down vs bottom-up
Codex tends to think architecturally first. flags structural issues early and pushes for refactoring before you write more code. Claude jumps in and starts building fast, which feels productive until you hit a wall of edge cases you didn't plan for. for bigger features, Codex's "slow down and think" approach usually wins out.
over-engineering vs shortcuts
Claude's failure mode is over-abstraction. too many layers, too much modularity for what you actually need. Codex goes the opposite way, cuts corners, skips edge cases. so i literally cross-review: feed Claude's output to Codex and vice versa. they catch each other's blind spots surprisingly well.
greenfield vs precision work
for creative/new features, Claude moves fast and generates ideas. but Codex sometimes ships a more "complete" result out of the box (tried making a 2D platformer with both. Codex auto-generated sprite cleanup, Claude didn't even build the floor lol). for infra or anything requiring precision, both struggle, but Codex grinds through test-fix cycles longer before giving up.
planning style
Claude gives you clean markdown with actionable snippets. Codex generates strict XML-style architecture docs, thorough but harder to read. personally i prefer Claude's style for day-to-day work, it just feels more... human to work with.
so how do i resolve disagreements?
you don't pick a winner. you treat it like a code review between two engineers with different backgrounds. not depending on a single provider isn't just a resilience thing, it genuinely produces better output when you let them challenge each other.
hybrid workflow
that's exactly why a hybrid approach is working well for me right now. something like: Claude for planning → Codex for review & implementation → Claude for final check. each model plays to its strengths in sequence.
main brain vs sub brain?
comes down to your preference and what you're trying to build. no universal right answer. i switch the lead role depending on the task, and that flexibility is part of the fun.

that said, this is my personal answer as someone who can freely pick tools. in a business context? often you don't get to choose. company policy, compliance, contracts, etc. can lock you into one provider. so the practical answer is... it depends lol

work in progress i've been intentionally throwing the same tasks at both models to compare, and there's still a lot of testing to do. every model update from each company can shift the balance. what's true today might not hold in a few months.

still exploring, still learning. this turned out longer than expected lol. might be worth its own article at some point 😄

Ned C • Feb 15

this is a really solid breakdown. the cross-review pattern where you feed one model's output to the other is something i want to try more deliberately. the personality difference you mention (Claude chatty, Codex serious worker) maps to what i've seen too. curious if you've hit cases where their architectural disagreements were both wrong, or does one usually end up closer to the right call?

灯里/iku • Feb 16

@nedcodes
That's a rather intriguing question!

Short answer: one usually ends up closer to the right call. But there are some fun failure patterns.

Both wrong in the same direction (too optimistic)

I design AI-integrated workflows for organizations. Every company has a different daily stack: Slack for chat, SharePoint for docs, Salesforce for CRM, Google Drive for storage... often a beautiful mess with no clean integration. Very common in Japan, probably everywhere though lol.

When I ask both Claude and Codex to plan architectures for these environments, they both propose elegant solutions that assume humans will actually follow the new workflow. They seriously underestimate how lazy people are. Both end up too idealistic about adoption.

Both wrong in complementary ways (this one's sneaky)

Their different mistakes don't cancel out.
Distributed systems

Codex piles on unnecessary config and endpoints (bloat), Claude proposes fast implementations that ignore race conditions. Both "look like they work" but are fundamentally broken.
Large-scale refactoring: Claude suggests beautifully modular code that doesn't scale, Codex produces conservative rewrites of outdated patterns. Both miss the real bottleneck (e.g., a denormalized DB schema that crashes in production).

It's not "same mistake twice." It's "different mistakes that don't offset each other." Root cause? Both hallucinate with confidence.

What I actually do about it

Feed both models detailed context about the client's existing tools and daily habits. Anchors them in reality instead of theory.
Split PDCA: Plan and Act stay with the human, Do and Check get delegated to AI. Human stays the architect and final judge.
Same task to both models, let them "debate," then you make the call. This alone cuts failure rates dramatically. This is why I side-eye the "automate everything!" hype a bit. The real value of cross-review isn't picking the winner. It's that disagreement itself is a signal: when they diverge, think harder, don't just pick one.

It all boils down to the basics: context is key.

Ned C • Feb 18

the "both wrong in complementary ways" failure mode is the one i hadn't thought through. i was assuming cross-review works because disagreement is a signal, but if they're both confidently wrong in different directions you just end up with two plausible-looking bad answers instead of one. that's way harder to catch than one obviously wrong output. your PDCA split where the human stays as architect makes more sense for that scenario than trying to automate the tiebreaker

灯里/iku • Feb 19

@nedcodes
haha yeah exactly!
When both outputs look plausible, you actually let your guard down. that's the sneaky part.

and to be clear, i'm not against automation at all! i just think LLMs genuinely can't tell when they're wrong, so someone's gotta cover that blind spot. it's less about control and more about... caring for the process, i guess?

we're still super early in figuring out how humans and AI work together. i'd love to keep exploring what that looks like with people who actually think about it like you do;)

Ned C • Feb 19

the tricky part is that the failure modes are different from what we're used to, so we don't even have good instincts for when to double check yet. i think that's what makes the "both wrong in complementary ways" thing so dangerous, you can't just pattern match your way out of it