DEV Community

naoki_JPN
naoki_JPN

Posted on

I used Claude Code + Codex for 2 months and hit 3.77 billion tokens in a single month

Note: Information in this article is current as of May 2026.

I've been using both Claude Code and OpenAI Codex for personal development for two months. I wanted to get a clearer picture of my actual usage, so I tracked token consumption properly. The total from March through early May (up to 5/7) exceeded 4 billion tokens, with April alone hitting 3.77 billion tokens.

How I measured

Claude Code
Claude Code stores conversation logs as JSONL files under ~/.claude/projects/. Each record includes model name, timestamp, and token counts for input/output/cache_creation/cache_read. I parsed these and aggregated by month and model.

Codex
Codex stores per-session JSONL logs under ~/.codex/sessions/. The log contains multiple token_count events; I used the last total_token_usage per session as the usage for that session.


Monthly Trends

Total

Month Total Tokens Total Cost
March 2026 265M ~$115
April 2026 3,770M ~$2,100
May 2026 ※ 217M ~$155

Breakdown

Month Claude Tokens Claude Cost Codex Tokens Codex Cost
March 2026 263M $115 2M ~$0.3
April 2026 3,570M $1,950 196M ~$147
May 2026 ※ 157M $112 60M ~$43

※ Both Claude and Codex data through 2026-05-07.

Claude costs are API-rate equivalents. Codex costs are estimated from published model rates (gpt-5.3-codex: $1.75/$14 per M, gpt-5.4: $2.50/$15 per M, gpt-5.5: $5.00/$30 per M, cached input at 10% of input rate).

March was when I started using Claude Code — I kept hitting rate limits and upgraded my plan repeatedly. In April I pushed harder, and around mid-April I introduced Codex to handle overflow when Claude hit its limits. April combined reached 3.77 billion tokens, about 14x March.

What happened in April

Breaking it down by day, April 13 alone consumed 220 million tokens.

Date Claude Code Tokens
Apr 9 11M
Apr 10 14.8M
Apr 13 219.8M
Apr 14 4.09M
Apr 16 6.29M
Apr 20 23.37M
Apr 23 11.9M

That single week (Apr 13 week) accounted for 68% of the month. Apr 13 stands out — it matches a day when I was running Agent/Subagents in heavy parallel.

Model breakdown

Claude Code

Model Period Tokens Est. Cost
claude-haiku-4-5 Mar–May 410M $81
claude-sonnet-4-6 Mar–May 2,520M $1,220
claude-opus-4-6 Mar–Apr 270M $284
claude-opus-4-7 Apr–May 780M $593

Sonnet was the main workhorse. Opus-4-7 launched in April and I adopted it immediately.

Codex

Model Period Tokens Est. Cost
gpt-5.3-codex March 2M ~$0.3
gpt-5.4 April 69M ~$43
gpt-5.5 Late Apr–May 189M ~$148

Model evolution was rapid — three generations in three months.


Cache

Both tools are heavily cache-dependent, but in different ways.

Claude Code

Type Mar–Apr Total
raw input 342K
output 17.5M
cache creation 144M
cache read 3,770M
total 3,830M

96% of all tokens are cache reads. Claude Code re-injects large contexts (CLAUDE.md, codebase, conversation history) via cache on every session, keeping raw input tiny.

Codex

Model Cached input ratio
gpt-5.3-codex 80%
gpt-5.4 91%
gpt-5.5 96%

94% of Codex input is cached. The ratio increases with newer models. gpt-5.5 has a higher per-token price, but in my logs the high cached-input ratio kept actual costs in check. Looking at raw input/output alone misrepresents what's really happening in both tools.


Division of labor

These two tools aren't competing — they cover different roles. Claude Code handles Biz tasks (docs, research, design, organizing), while Codex is dedicated to Dev (implementation).

That said, the division shifted when gpt-5.5 arrived. Up to gpt-5.4, Codex felt like talking to a senior engineer who only cared about technical correctness — no real dialogue. So I only used it as a backend called from Claude. With gpt-5.5, it finally felt like a real conversation partner, and I started giving it implementation instructions directly. Now the flow is: Claude creates tickets, Codex handles implementation.

The token volume difference reflects the nature of the tasks. Biz work generates large contexts and long conversations. On top of that, Codex only ran at full capacity from mid-April, so the periods aren't even comparable. A raw quantity comparison doesn't mean much.


Takeaways

  • The April explosion coincides with when I seriously started testing Multi-agent parallel execution. Running 10 Subagents simultaneously sends token counts through the roof fast
  • High cache dependency means both tools are designed for large, persistent contexts. The more you invest in CLAUDE.md and documentation, the more the cache works for you
  • Codex model iteration is fast — three generations in three months. The high cached-input ratio on gpt-5.5 is cost-efficient in practice
  • Token composition matters. Claude Code is dominated by output and cache_read; Codex is dominated by cached input. Raw input/output alone gives a misleading picture

Measuring this made my "just using it" habits much clearer. Cache ratio and Agent parallel execution impact are things you simply don't see unless you look.

Feel free to leave a comment if you'd like to discuss how to approach AI adoption in your organization.

Top comments (0)