I used Claude Code + Codex for 2 months and hit 3.77 billion tokens in a single month

#claudecode #codex #openai #anthropic

Note: Information in this article is current as of May 2026.

I've been using both Claude Code and OpenAI Codex for personal development for two months. I wanted to get a clearer picture of my actual usage, so I tracked token consumption properly. The total from March through early May (up to 5/7) exceeded 4 billion tokens, with April alone hitting 3.77 billion tokens.

How I measured

Claude Code
Claude Code stores conversation logs as JSONL files under ~/.claude/projects/. Each record includes model name, timestamp, and token counts for input/output/cache_creation/cache_read. I parsed these and aggregated by month and model.

Codex
Codex stores per-session JSONL logs under ~/.codex/sessions/. The log contains multiple token_count events; I used the last total_token_usage per session as the usage for that session.

Monthly Trends

Total

Month	Total Tokens	Total Cost
March 2026	265M	~$115
April 2026	3,770M	~$2,100
May 2026 ※	217M	~$155

Breakdown

Month	Claude Tokens	Claude Cost	Codex Tokens	Codex Cost
March 2026	263M	$115	2M	~$0.3
April 2026	3,570M	$1,950	196M	~$147
May 2026 ※	157M	$112	60M	~$43

※ Both Claude and Codex data through 2026-05-07.

Claude costs are API-rate equivalents. Codex costs are estimated from published model rates (gpt-5.3-codex: $1.75/$14 per M, gpt-5.4: $2.50/$15 per M, gpt-5.5: $5.00/$30 per M, cached input at 10% of input rate).

March was when I started using Claude Code — I kept hitting rate limits and upgraded my plan repeatedly. In April I pushed harder, and around mid-April I introduced Codex to handle overflow when Claude hit its limits. April combined reached 3.77 billion tokens, about 14x March.

What happened in April

Breaking it down by day, April 13 alone consumed 220 million tokens.

Date	Claude Code Tokens
Apr 9	11M
Apr 10	14.8M
Apr 13	219.8M ←
Apr 14	4.09M
Apr 16	6.29M
Apr 20	23.37M
Apr 23	11.9M

That single week (Apr 13 week) accounted for 68% of the month. Apr 13 stands out — it matches a day when I was running Agent/Subagents in heavy parallel.

Model breakdown

Claude Code

Model	Period	Tokens	Est. Cost
claude-haiku-4-5	Mar–May	410M	$81
claude-sonnet-4-6	Mar–May	2,520M	$1,220
claude-opus-4-6	Mar–Apr	270M	$284
claude-opus-4-7	Apr–May	780M	$593

Sonnet was the main workhorse. Opus-4-7 launched in April and I adopted it immediately.

Codex

Model	Period	Tokens	Est. Cost
gpt-5.3-codex	March	2M	~$0.3
gpt-5.4	April	69M	~$43
gpt-5.5	Late Apr–May	189M	~$148

Model evolution was rapid — three generations in three months.

Cache

Both tools are heavily cache-dependent, but in different ways.

Claude Code

Type	Mar–Apr Total
raw input	342K
output	17.5M
cache creation	144M
cache read	3,770M
total	3,830M

96% of all tokens are cache reads. Claude Code re-injects large contexts (CLAUDE.md, codebase, conversation history) via cache on every session, keeping raw input tiny.

Codex

Model	Cached input ratio
gpt-5.3-codex	80%
gpt-5.4	91%
gpt-5.5	96%

94% of Codex input is cached. The ratio increases with newer models. gpt-5.5 has a higher per-token price, but in my logs the high cached-input ratio kept actual costs in check. Looking at raw input/output alone misrepresents what's really happening in both tools.

Division of labor

These two tools aren't competing — they cover different roles. Claude Code handles Biz tasks (docs, research, design, organizing), while Codex is dedicated to Dev (implementation).

That said, the division shifted when gpt-5.5 arrived. Up to gpt-5.4, Codex felt like talking to a senior engineer who only cared about technical correctness — no real dialogue. So I only used it as a backend called from Claude. With gpt-5.5, it finally felt like a real conversation partner, and I started giving it implementation instructions directly. Now the flow is: Claude creates tickets, Codex handles implementation.

The token volume difference reflects the nature of the tasks. Biz work generates large contexts and long conversations. On top of that, Codex only ran at full capacity from mid-April, so the periods aren't even comparable. A raw quantity comparison doesn't mean much.

Takeaways

The April explosion coincides with when I seriously started testing Multi-agent parallel execution. Running 10 Subagents simultaneously sends token counts through the roof fast
High cache dependency means both tools are designed for large, persistent contexts. The more you invest in CLAUDE.md and documentation, the more the cache works for you
Codex model iteration is fast — three generations in three months. The high cached-input ratio on gpt-5.5 is cost-efficient in practice
Token composition matters. Claude Code is dominated by output and cache_read; Codex is dominated by cached input. Raw input/output alone gives a misleading picture

Measuring this made my "just using it" habits much clearer. Cache ratio and Agent parallel execution impact are things you simply don't see unless you look.

Feel free to leave a comment if you'd like to discuss how to approach AI adoption in your organization.