DEV Community

Cover image for Claude Code vs ChatGPT Codex: Two Official Agents, One Choice You Don't Have to Make
Ken Imoto
Ken Imoto

Posted on • Originally published at kenimoto.dev

Claude Code vs ChatGPT Codex: Two Official Agents, One Choice You Don't Have to Make

For a long time my AI coding workflow was a junk drawer. Aider, Continue, OpenClaw, three different VS Code plugins, an .envrc with API keys for providers I'd forgotten I had. Then on April 4, 2026, Anthropic banned third-party agents from Claude Pro and Max subscriptions, and half my drawer stopped opening.

That was the part where I should have panicked. Instead I did the boring thing: I deleted everything except Claude Code and ChatGPT Codex. The two official agents. Both backed by the labs that train the models. Both at the same $100/month sweet spot now that OpenAI explicitly priced ChatGPT Pro against Claude Max.

I expected to pick a winner inside a week. A month later I'm still running both, and that turns out to be the actual answer.

Two tools, two completely different ideas

It's tempting to call Claude Code and Codex "competitors." They are, in the same sense that a chef's knife and a slow cooker compete: they both produce dinner, but if you compare them on a spec sheet you'll miss the point.

Claude Code lives in your terminal. You run claude in a project directory and it has full read/write access to your filesystem. It edits files in real time. You watch it work. When it suggests a change, you can stop it mid-sentence, push back, redirect, and the conversation just keeps going. The mental model is "pair programmer who reads fast."

ChatGPT Codex lives in the cloud. You give it a task, it spins up a sandbox, clones your repo from GitHub, and returns a pull request whenever it's done. You can queue five tasks before lunch and review the PRs after. The mental model is "intern who works from home and only files complete reports."

Same goal. Almost no overlap in how they get there.

The boring spec sheet (which I promised not to lean on, and will now lean on briefly)

Here's the cheat sheet I keep in a Markdown file because I forget it twice a week:

Dimension Claude Code ChatGPT Codex
Where it runs Your machine OpenAI's cloud
Interaction Synchronous, conversational Asynchronous, queued
File access Direct local filesystem Sandboxed clone of your GitHub repo
Pipe mode claude -p reads stdin No
Subscription Pro $20, Max 5x $100, Max 20x $200 Plus $20, Pro $100 (new), Pro $200
Default model Claude Sonnet 4.6 / Opus 4.6 GPT-5.3 Codex (codex-1 fine-tune of o3)
Best benchmark right now 67% win rate on independent code-quality reviews 77.3% on Terminal-Bench 2.0
Parallel agents Sub-agents via skills Multi-agent orchestration in Codex app
Voice / mobile flow Limited Voice input, mobile review of PRs

Two things jump out. First, the price ladders are now nearly aligned: OpenAI's new $100 ChatGPT Pro tier (introduced April 2026) was explicitly positioned against Claude Max 5x. Both give you "the heavy agent tier" for the same monthly outlay. The $200 tier is what you pay if you want a parallel-agent runway and don't want to think about quotas. Second, benchmark winners depend entirely on what you're measuring. Code quality on long, context-heavy tasks: Claude. Pure terminal-grind throughput: Codex. Anyone who tells you "X is better at coding" is selling something.

What each one actually feels like

Specs are easy to publish and easy to argue about. The texture of each tool is what you only learn by using it, so let me describe the texture.

Claude Code feels like editing a Google Doc with a fast colleague over your shoulder. You type a request. Files start changing. If it goes the wrong direction in line 4, you say "no, drop the Redis cache, just use SQLite" and it actually backs out. The blast radius of a bad instruction is bounded by how fast you can type Ctrl-C. The cost is that you have to be there. Claude Code is bad at the kind of task you'd hand off and walk away from, because it expects you to keep the conversation moving.

Codex feels like email with someone in another time zone. You write a clear ticket. You hit send. You go do something else. The PR shows up later, and if it's wrong, you write another ticket. The blast radius is bounded by what's in the sandbox: it can't accidentally clobber your local Postgres because it's not running on your machine. The cost is that ambiguous instructions don't get clarified; they get interpreted, and you find out three hours later.

I tried for two weeks to use only one of them. Both attempts ended badly. With only Claude Code, I lost half a Sunday because I sat through a refactor that I should have queued and walked away from. With only Codex, I burned an afternoon waiting on PRs for changes I could have made conversationally in fifteen minutes.

The mistake I was making was treating "AI coding agent" as one job. It's at least two jobs. The synchronous one and the asynchronous one. They share a name and almost nothing else.

The workflow that finally worked

After I stopped trying to crown a winner, the actual workflow fell out almost on its own.

The split runs along one axis: how much of your attention does this task deserve right now?

If the answer is "all of it" (debugging a weird production trace, writing a tricky migration, trying to understand a piece of code I didn't write), Claude Code wins. The conversational loop is the whole point. I want to interrupt, ask "wait, why did you choose that?", and have it explain.

If the answer is "none of it, please just do it while I'm in this meeting" (bump dependencies, add tests for the three uncovered functions, port this script from Python 3.10 to 3.13, write a draft PR for that GitHub issue I triaged yesterday), Codex wins. I write a one-paragraph spec, queue it, and review the PR an hour later on my phone.

There's a third pattern I didn't expect: using them on the same task, in sequence. I'll have Claude Code architect a feature in conversation: let it show me the trade-offs, talk me through three approaches, generate the first draft. Then I'll hand the resulting PR scope to Codex, with a precise spec, to grind out the boring sibling implementations. Claude does the thinking-heavy 20%; Codex parallelizes the typing-heavy 80%. The combined cost on Max 5x + ChatGPT Pro is ~$200/mo at the new tier alignment, which sounds like a lot until you remember it's roughly the cost of having a contractor for one day a month.

A few sharp edges nobody warns you about

A month in, here are the things I wish someone had told me.

Claude Code's pipe mode (claude -p) is criminally underdocumented. You can pipe stdin straight into it, which means it composes with every Unix tool you already know:

git diff HEAD~1 | claude -p "review this diff for SQL injection risks"
Enter fullscreen mode Exit fullscreen mode

That's a one-line code review. It is not, it turns out, available in Codex. Codex's superpower is GitHub PRs; Claude Code's superpower is being a Unix citizen. Use the right one.

Codex doesn't see your local environment, and that's a feature. I burned an embarrassing amount of time early on trying to debug "why doesn't Codex have access to my .env file?" The answer is: because it's running on someone else's computer. Once that clicks, you stop fighting it. Anything that requires a real local service (Docker compose, a running database, a quirky internal CLI) belongs to Claude Code. Anything self-contained in the repo belongs to Codex.

Anthropic's third-party crackdown is real but narrower than the panic suggested. The April 2026 changes mostly affected tools that piggybacked on Claude Max subscriptions to resell Claude — OpenClaw and similar shims. The official claude CLI, the Agent SDK, and MCP integrations are all fine. If you'd been routing Claude through a third-party agent, that's the thing that broke. Switching to the official CLI is a one-line change in most workflows. If you actively want to keep the third-party tools, Anthropic now charges per-session "extra usage" — useful escape hatch, but no longer free with your Max sub.

Voice input on Codex actually moves the needle. I rolled my eyes when I heard about it. Then I tried walking the dog while dictating "rewrite the migrations folder to use the new naming convention, open a PR" and the PR was waiting when I got home. That's not a productivity hack. That is a category change.

Parallel agents are useful at different scales. Claude Code's sub-agents (Explore, Plan, Skills running in context: fork) are great for "research while I keep working." Codex's multi-agent orchestration introduced in the April 2026 update is great for "fork five copies of this and try variations." Different problems, not interchangeable.

So which one should you pay for?

If you can only pay for one and want a single, defensible choice: start with Claude Code on the Pro plan ($20). It's the lower-friction entry. The conversational loop teaches you what these tools are good at and what they're not, and you can graduate to Max later if your usage pushes the limits.

If you have a workflow that's genuinely "queue tasks, review PRs," Codex is the better fit, and you should skip straight to Plus or Pro depending on volume.

If your honest answer is "I want both," budget for both. The duplication of cost is real, but the duplication of capability is mostly an illusion: they cover different jobs that happen to share a job title.

What I'd warn against is the failure mode I almost talked myself into: picking one, declaring it the winner, and forcing the other half of your work to fit the wrong tool. That's the real cost. Both companies are pricing these subscriptions to be in the noise compared to a developer's salary. The point of optimizing here is not to save $100/mo. It's to not waste an afternoon waiting on a PR you should have written in chat, or sitting through a refactor you should have queued and walked away from.

The choice you don't have to make is the binary one. Pick the workflow per task. The agents will sort themselves out.


Posted from a workflow where Claude Code drafted this article in conversation, and Codex was queued to fix the typos in a PR while I went to make coffee. The coffee was, predictably, a mistake. The PR was fine.


If you've read this far and you're already pattern-matching the workflows I described onto your own week, the long-form version of this is in the book — Claude Code patterns, the CLAUDE.md taxonomy, Plan Mode, sub-agents, when to bring Codex into the loop, and team operations.

👉 Practical Claude Code — Patterns for Building Reliable Agent Harnesses

Top comments (0)