Recca Tsai

Posted on May 9 • Originally published at recca0120.github.io

36 Days of Claude Code Logs: Silent Model Switching, 11.5x Efficiency Gap

#claudecode #promptcaching #agents #python

Originally published at recca0120.github.io

}}">The first post scanned 95 days of logs and found sub-agent cache TTL silently dropped to 5m. }}">The second tracked it to 17 consecutive days of 100% 5m — conclusion: it's the new default.

This time I broke down the model dimension. Scanning March through May 7, I originally wanted to confirm whether the cache TTL had reverted (it hasn't). Instead I found something bigger: the server doesn't just control cache TTL — it silently switched the main agent model three times.

Data Source

Same as before: ~/.claude/projects/{project-path}/{session-uuid}.jsonl. This time I also checked the message.model field in API responses:

{
  "message": {
    "model": "claude-opus-4-6",
    "usage": {
      "input_tokens": 142,
      "cache_read_input_tokens": 892041,
      "output_tokens": 3847,
      "cache_creation": {
        "ephemeral_5m_input_tokens": 0,
        "ephemeral_1h_input_tokens": 8234
      }
    }
  }
}

The model field comes from the server, not the client. Whatever the API says it used, that's what it used.

Only Four Models

Scanning all JSONL files, only four models appeared:

Model ID	Short	Input	Cache Read	Cache Write 5m	Cache Write 1h	Output
opus-4-6	O4.6	$15/MTok	$1.50	$18.75	$30	$75
opus-4-7	O4.7	$15/MTok	$1.50	$18.75	$30	$75
sonnet-4-6	S4.6	$3/MTok	$0.30	$3.75	$6	$15
haiku-4-5	H4.5	$0.80/MTok	$0.08	$1.00	$1.60	$4

Cache read costs differ 5x between Opus and Sonnet ($1.50 vs $0.30), output 5x ($75 vs $15). Since cache reads dominate Claude Code API calls, model choice directly determines cost magnitude.

Main Agent Silently Switched Three Times

Using cc-office (my primary project) as an example, the main agent model timeline:

Date         O4.6     O4.7     S4.6     Total  Dominant
───────────────────────────────────────────────────────
2026-04-07   3,707        0        0    3,707  O4.6 100%
2026-04-13   2,821        0        0    2,821  O4.6 100%
2026-04-14   3,385        0      315    3,704  O4.6 91%   ← S4.6 appears
2026-04-15       0        0    3,445    3,449  S4.6 100%  ← First switch
2026-04-16       0        0    5,949    5,949  S4.6 100%
2026-04-17       0    1,855    3,621    5,476  S4.6 66%
2026-04-18       0    1,973        0    1,973  O4.7 100%  ← Second switch
2026-04-25     211    5,386        0    5,597  O4.7 96%
2026-04-26   2,308        0        0    2,308  O4.6 100%  ← Back to O4.6
2026-04-29   2,149        0        0    2,149  O4.6 100%
2026-04-30     514        0    1,213    1,727  S4.6 70%   ← Third switch
2026-05-01       0        0    3,492    3,492  S4.6 100%
2026-05-05     350        0    3,187    3,537  S4.6 90%
2026-05-06   2,347        0        0    2,347  O4.6 100%  ← Back again
2026-05-07   4,197       44        0    4,241  O4.6 99%

I had opus-4-6 1m context selected the entire time. But the server returned a different model three times:

4/15-4/17: Downgraded to sonnet-4-6 (3 days)
4/18-4/25: Switched to opus-4-7 (8 days)
4/30-5/5: Downgraded to sonnet-4-6 again (6 days)

Each switch was binary — 100% one model the day before, 100% another the next day. Same pattern as the cache TTL regression: sharp switch, no announcement, client unaware.

Sub-Agent Models Are Not Your Choice

Sub-agent models are decided by Claude Code autonomously, not by user settings. The distribution varies dramatically:

Period	Main	Sub O4.6	Sub O4.7	Sub S4.6	Sub H4.5
3/26-4/14	O4.6	76%	—	4%	19%
4/15-4/17	S4.6	0%	—	7%	92%
4/18-4/25	O4.7	—	27%	1%	73%
4/26-4/30	O4.6	34%	—	28%	37%
5/01-5/05	S4.6	—	—	64%	36%
5/06-5/07	O4.6	47%	—	30%	23%

When main uses Opus, sub-agents tend to also use Opus (76%). When main is downgraded to Sonnet, sub-agents switch to mostly Haiku (92%). This correlation isn't coincidental — the server adjusts sub-agent model allocation alongside main model changes.

How to Measure Efficiency

The previous posts focused on cache TTL. This time: how much did you spend for how much main agent output.

Why main output:

Main agent output is what you're paying for — code, edits, answers
Sub-agents are overhead — their job is to search and gather for the main agent
Sub-agent output feeds into main agent input, not into your deliverables

Core metric:

Total cost (main + sub) per million main output tokens

Lower = more efficient.

Efficiency Rankings Across Seven Periods

Segmented by dominant main agent model:

Rank	Period	Main	S/M ratio	$/M main output	$/day
1 ⚡	5/01-5/05	S4.6	0.91	$167	$319
2	4/15-4/17	S4.6	0.30	$218	$583
3	3/09-3/21	S4.6	2.04	$875	$144
4	4/26-4/30	O4.6	0.47	$896	$1,450
5	4/18-4/25	O4.7	0.23	$1,134	$3,148
6	5/06-5/07	O4.6	0.35	$1,554	$2,836
7 🐌	3/26-4/14	O4.6	0.55	$1,925	$2,137

11.5x gap between the most and least efficient periods.

Most Efficient: Main S4.6 + Sub S4.6/H4.5 (5/01-5/05)

Main:  21,082 calls (4,216/day)  Model: S4.6 98%
Sub:   19,266 calls (3,853/day)  Model: S4.6 64%, H4.5 36%
Total: $1,596 ($319/day)
Main output: 9,559,468 (1,911,894/day)
$/M main output: $167

S/M ratio of 0.91 looks high — nearly one sub call per main call. But subs only use Sonnet and Haiku, so overhead is just $56/M main output. Cheap sub calls don't hurt even when frequent.

Best bang for buck: 5,991 main output tokens per dollar.

Least Efficient: Main O4.6 + Sub 76% O4.6 (3/26-4/14)

Main:  41,086 calls (2,162/day)  Model: O4.6 99%
Sub:   22,460 calls (1,182/day)  Model: O4.6 76%, S4.6 4%, H4.5 19%
Total: $40,594 ($2,137/day)
Main output: 21,092,340 (1,110,123/day)
$/M main output: $1,925

S/M ratio is only 0.55 — looks disciplined. But sub-agents used 76% Opus, meaning every sub call pays Opus-rate cache reads. Sub overhead hits $477/M main output.

Main output per day was only 1.11M — the lowest across all periods. Most money spent, least produced.

Most Expensive but Not Most Efficient: Opus 4.7 (4/18-4/25)

Main:  31,204 calls (3,900/day)  Model: O4.7 99%
Sub:    7,233 calls (904/day)    Model: O4.7 27%, H4.5 73%
Total: $25,187 ($3,148/day)
Main output: 22,219,870 (2,777,484/day)
$/M main output: $1,134

Highest daily output (2.77M tokens), lowest S/M ratio (0.23), sub overhead only $15/M. Looks lean, but $3,148/day is steep — the Sonnet period (4/15-4/17) produced 2.67M/day for just $583.

Sub-Agent Overhead Rankings

Sub cost divided by main output — pure overhead measurement:

Rank	Period	S/M ratio	Sub Composition	Sub $/M main output
1 ✅	4/15-4/17	0.30	H4.5 92%	$6
2	4/18-4/25	0.23	H4.5 73%, O4.7 27%	$15
3	5/01-5/05	0.91	S4.6 64%, H4.5 36%	$56
6	3/26-4/14	0.55	O4.6 76%	$477
7 ❌	3/09-3/21	2.04	O4.6 20%, S4.6 71%	$695

Two patterns:

Sub-agents should use Haiku. Period 4/15-4/17 with 92% Haiku had $6/M overhead — 1/80th of using Opus
High S/M ratio isn't inherently bad. Period 5/01-5/05 had 0.91 ratio but cheap models, so overhead was only $56. Period 3/26-4/14 had 0.55 ratio but 76% Opus, pushing overhead to $477

S/M ratio isn't the problem. What model the sub uses is the problem.

Side-by-Side Comparison

Metric	Best S4.6 (5/01-5/05)	O4.6 (3/26-4/14)	O4.7 (4/18-4/25)
Daily cost	$319	$2,137	$3,148
Daily main output	1,911,894	1,110,123	2,777,484
$/M main output	$167	$1,925	$1,134
Tokens per dollar	5,991	520	882
Sub overhead/M	$56	$477	$15
Sub primary model	S4.6+H4.5	O4.6 76%	H4.5 73%

What You Can't Control

All analysis in this post comes with a caveat: model choice isn't fully in your hands.

What you control:

Selecting a model in Claude Code settings (I selected opus-4-6 1m)

What you don't control:

Server may silently swap your main agent model
Sub-agent models are assigned by Claude Code autonomously
Cache TTL is server-decided (sub-agent stuck at 100% 5m for 29 consecutive days)

The "Using Opus 4.6" label in Claude Code may not reflect reality. Scanning JSONL for the API response model field is the only reliable way to verify.

Cache TTL Status: Still 100% 5m

Updating the cache TTL situation. Scanning 4/30-5/7:

Metric	Main Agent	Sub Agent
Total API calls	37,366	26,160
1h cache write	100%	0
5m cache write	0	100%

Since the }}">first post's 4/9 mark, sub-agents have been at 100% 5m for 29 consecutive days, zero 1h writes. No sign of reverting.

How to Scan Your Own Data

Building on the Python from previous posts, now with model breakdown:

#!/usr/bin/env python3
import json
from pathlib import Path
from collections import defaultdict

ROOT = Path.home() / ".claude/projects"
data = defaultdict(lambda: defaultdict(lambda: defaultdict(
    lambda: {"calls":0, "input":0, "cache_read":0, "output":0}
)))

for jsonl in ROOT.rglob("*.jsonl"):
    agent = "sub" if "subagent" in str(jsonl) else "main"
    try:
        for line in jsonl.open():
            try: obj = json.loads(line)
            except: continue
            msg = obj.get("message", {})
            if not isinstance(msg, dict): continue
            u = msg.get("usage") or {}
            inp = u.get("input_tokens", 0)
            cr = u.get("cache_read_input_tokens", 0)
            out = u.get("output_tokens", 0)
            if not (inp or cr or out): continue
            day = (obj.get("timestamp") or "")[:10]
            model = (msg.get("model") or "unknown").replace("claude-", "")
            r = data[day][agent][model]
            r["calls"] += 1
            r["cache_read"] += cr
            r["output"] += out
    except: pass

for day in sorted(data):
    if day < "2026-03-01": continue
    for agent in ["main", "sub"]:
        models = data[day][agent]
        if not models: continue
        parts = [f"{m}={v['calls']}" for m, v in
                 sorted(models.items(), key=lambda x: -x[1]["calls"])]
        print(f"{day}  {agent:4}  {', '.join(parts)}")

Run it to see what model your main agent actually used — and whether it matches what you selected.

Conclusions

Model choice is the biggest cost factor. Cache TTL affects cost ~2x; model affects 5-11x. The cache read price gap between Opus and Sonnet (5x) translates to thousands of dollars per day
The server silently switches models. I selected opus-4-6, but across 36 days, 17 were switched to sonnet-4-6 or opus-4-7. Same pattern as the cache TTL regression — no announcement
Sub-agents using Opus is the biggest waste. Sub-agent work is search and exploration; Haiku is sufficient. Sub overhead with 76% Opus is 80x higher than with 92% Haiku
High S/M ratio isn't inherently bad. What model the sub uses matters more than how many times it runs. Many cheap sub calls beat one expensive Opus sub call
The most efficient combination ($167/M main output) and the least efficient ($1,925/M) differ by 11.5x — same user, same project, same type of work

What model you select in Claude Code doesn't matter — what matters is what the server actually gives you. Scanning your own JSONL is the only reliable method.

DEV Community