How I Split Work Between Claude Code and Codex in Real Projects

Sam Lai — Sun, 22 Mar 2026 05:33:49 +0000

I usually have two terminals open: Claude Code on the left, Codex on the right.

Not for benchmarking. Just for work.

I'm a Java backend developer working on a supply chain system with 20+ Spring Boot microservices, a lot of business logic, and the usual amount of legacy debt. After using both tools side by side for a few weeks, I stopped thinking of them as competitors.

They do different jobs.

My Split

The short version:

Claude Code handles understanding. Codex handles execution.

When I'm debugging something messy or reviewing code that touches real business logic, I usually start with Claude Code. It's better at following context through multiple layers and explaining why something is happening.

When the task is more mechanical, parallelizable, or just high-volume, I hand it to Codex. Tests, docs, repetitive edits, cleanup work — that's where it fits well.

The exact model breakdown I use:

Scenario	Tool	Model
Everyday coding, new features	Claude Code	Sonnet 4.6
Complex bugs, hard problems	Claude Code	Opus 4.6
Routine coding tasks	Codex	gpt-5.3-codex
Problem investigation, deep reasoning	Codex	gpt-5.4 (default)

One thing worth noting: higher model version ≠ better for daily work. Opus and gpt-5.4 have noticeably longer reasoning times. For routine tasks, that wait breaks the flow. I only reach for the heavier models when I'm genuinely stuck.

I didn't arrive at this from theory. A production issue forced the pattern on me.

The Bug That Made It Obvious

We hit a production anomaly that had probably been wrong for years, but only surfaced once a new code path triggered it.

Classic legacy-code problem: a boundary condition buried inside old business logic, the kind people quietly work around for years instead of fixing properly.

I gave both tools the same inputs: the exception, the relevant service code, and the logs.

Claude Code traced the call chain and explained why the implementation failed in that specific scenario.

Codex also suggested a direction, but it was one layer off.

So I used Claude Code to pin down the root cause, fixed it — and while that was wrapping up, Codex was already running tests across multiple modules.

That was the moment the split became obvious: one tool was helping me understand the problem, the other was helping me move faster once the direction was clear.

The Mistake People Make When Using Both

The most common mistake is treating both agents like they should share the same instructions file.

In my setup, they don't.

CLAUDE.md is for Claude Code
AGENTS.md is for Codex
CHANGES.log is the handoff layer between them

project/
├── CLAUDE.md      ← Claude Code only
├── AGENTS.md      ← Codex only
└── CHANGES.log    ← shared task state

The project context is mostly the same. The behavior rules are not. That separation matters more than I expected.

💡 Setup tip: VSCode + Quick AI plugin — two terminal panels side by side, one for Claude Code, one for Codex, no window switching.

`CHANGES.log` Is the Useful Part

This ended up being the piece that actually made the workflow usable.

I often start with Claude Code. But during peak hours it sometimes slows down or drops mid-task. Switching to Codex is the obvious move — but without a handoff record, you lose time re-explaining everything.

So I use CHANGES.log as a task-state log, not a code-change log:

[2026-03-18 14:23] [CLAUDE_CODE] BUG-02
Status: PARTIAL
Done: Located boundary condition bug in OrderService.java — oversell issue
      caused by missing distributed lock in concurrent deduction path
Files modified: order-service/src/main/java/com/xxx/OrderService.java
Remaining: Idempotency check in InventoryClient.deduct() not yet added
Next agent: Pick up InventoryClient.java directly — full context available here

[2026-03-18 15:01] [CODEX] BUG-02 (pickup)
Status: COMPLETE
Files modified: inventory-service/src/main/java/com/xxx/InventoryClient.java
Tests: PASS — OrderServiceTest 5 passed, InventoryClientTest 3 passed

When I switch tools, I'm not starting over.

How I Split Tasks in Practice

For a typical refactor, the pattern usually looks like this:

Claude Code: trace the existing business logic, define the change boundary, handle the reasoning-heavy parts
Codex: generate tests, update docs, handle repetitive edits, or pick up follow-up work in parallel

If both tools give different answers, I don't blindly trust either. I compare the reasoning and make the call myself. That comparison is often where the real understanding happens.

What Legacy Code Changes

On a legacy codebase, context discipline matters more than people think.

If I throw in random files and vague instructions, both tools get generic fast.

So I keep the input tight: the interface that matters, the key logs, the exception, and just enough surrounding code to make the problem legible.

Less context, but better context.

What I Do Now

The useful question was never "Which one is better?"

The better question: what kind of work should each one own?

If the task is reasoning-heavy, start with Claude Code
If the task is parallelizable or repetitive, give it to Codex
If you might switch midway, write the state down first

That's where the productivity gain came from — not from picking one tool, but from giving each tool a clear job.

DEV Community: Sam Lai