DEV Community: Recca Tsai

Code Quest: A Claude Code Web UI That Runs in Interactive Mode — Just in Time for the June 15 Billing Change

Recca Tsai — Sat, 16 May 2026 09:47:01 +0000

Originally published at recca0120.github.io

Starting June 15, 2026, claude -p and Agent SDK usage no longer count toward subscription plan limits. Instead, eligible plans get a separate monthly Agent SDK credit — and once that's exhausted, usage moves to standard API billing. If you run automated scripts or headless pipelines today, that cost picture changes after June 15.

code-quest takes a different approach: it spawns Claude Code CLI directly, parses the full NDJSON interactive protocol, and delivers session management, file browsing, and Git integration in the browser. Interactive mode isn't covered by the new billing change — and that architectural choice turns out to be well-timed.

What Changes on June 15?

From the official documentation:

Starting June 15, 2026, Agent SDK and claude -p usage on subscription plans will draw from a new monthly Agent SDK credit, separate from your interactive usage limits.

In short: SDK and claude -p usage moves out of your subscription's shared pool into a separate credit bucket — which then bills at API rates once exhausted.

Affected:

claude -p (pipe / headless mode)
Agent SDK (Python / TypeScript)
Claude Code GitHub Actions
Third-party apps calling via Agent SDK

Not affected: interactive Claude Code sessions.

Once the monthly credit runs out, additional Agent SDK usage bills at standard API rates. If your workflow involves claude -p pipelines or SDK-wrapped automation, the cost math after June 15 is worth revisiting.

How code-quest Works

Three-tier architecture:

Browser (React 19 + Tailwind v4)
    ↓ WebSocket /ws
Server (Express + Drizzle ORM)
    ↓ WebSocket /summoner
Summoner (local binary, compiled with Bun)
    ↓ child_process spawn
Claude Code CLI

Summoner is the core piece. Compiled into a standalone binary with Bun, it runs on your local machine and handles:

Spawning Claude Code CLI with --output-format stream-json --input-format stream-json
Parsing each line of NDJSON output (system, assistant, user, result, stream_event, control_request)
Forwarding permission / elicitation prompts to the browser via WebSocket, then writing the response back to CLI stdin
All local filesystem, Git, and OpenSpec operations

Server runs in the cloud, handling only routing and persistence (SQLite or MySQL) — it never touches local resources. The browser connects to the server via WebSocket; the server connects to the local Summoner via a separate WebSocket.

This Split deployment design means the server can run anywhere without being co-located with the Claude Code CLI.

How It Differs from Other Web UIs

Existing Claude Code web interfaces follow roughly two patterns.

The first is the UI layer approach: the interface manages config files and session history, but actual Claude Code execution still happens in a terminal the user runs separately. These tools are useful for browsing and reviewing, but they can't intercept permission prompts from the UI, and session fork/resume requires manual steps.

The second is the bridge approach: the CLI connects back to the server via an SDK WebSocket path, and the UI receives the streamed output. This creates a real connection between UI and CLI — but the underlying transport is SDK mode. After June 15, that usage counts against the Agent SDK credit.

code-quest takes a third path: spawn the CLI directly, in interactive mode.

Not affected by the June 15 change: interactive mode usage stays within subscription limits, outside Agent SDK credit
Full protocol control: every control_request (permission prompts, elicitation) is handled in the browser in real time — not just observed as output
Split deployment: server runs in the cloud, Summoner runs locally, no co-location required
First-class session operations: fork, resume, rename are built-in features with DB persistence, not manual workarounds

Features

Session Management

Each Claude Code session maps to a channel:

Spawn: create a new session
Resume: restore from DB; CLI restarts with --resume <session-id>
Fork: branch from an existing session state to explore a different approach
Rename: label sessions for easy retrieval

Full NDJSON event history is stored in DB. content_block_delta events — streaming deltas that account for roughly 80% of all traffic — are split into a separate table and excluded from session history reads by default, keeping queries lightweight.

Git Worktree Support

code-quest has full worktree lifecycle management: create, list, delete, archive, rename. Each session can bind to its own worktree, letting multiple tasks run on separate branches without stepping on each other.

This is especially useful when running multiple Claude Code sessions in parallel — each works in an isolated worktree, so changes don't conflict.

File Explorer

Browse directories, read files, view diffs
Git status integration (modified, added, deleted files)
Fuzzy search via Fuse.js
RootGuard prevents directory traversal; EXPLORER_ROOTS configures which paths are accessible

Real-Time Push

File, Git, and OpenSpec state changes don't require polling. packages/broadcaster uses a DataSource / pub-sub pattern backed by chokidar file watching — changes flow automatically from Summoner → Server → browser as they happen.

WebSocket Reconnection

The custom ResumableSocket tracks sequence numbers. On reconnect, it replays missed events from a 500-event circular buffer. If the gap is too large to recover, it signals the client to refresh the full state instead.

OpenSpec Integration

code-quest has built-in support for the OpenSpec format. You can create, archive, and toggle spec tasks directly from the interface, with real-time spec state synced to the browser automatically.

Getting Started

git clone https://github.com/recca0120/code-quest.git
cd code-quest
pnpm install
pnpm dev

Open http://localhost:5173.

Copy apps/server/.env.example and adjust as needed. Key environment variables:

Variable	Default	Description
`APP_PORT`	`3000`	Server port
`DATABASE_SQLITE_URL`	—	SQLite path, e.g. `file:./data/code-quest.db`
`SUMMONER_MODE`	`local`	`local` (same machine) or `remote` (split deployment)
`SUMMONER_TOKEN`	—	Bearer token for remote Summoner auth
`CLI_AUTO_MODE`	`true`	Pass `--auto-mode` to Claude Code
`EXPLORER_ROOTS`	home dir	Comma-separated allowed root paths

For remote mode (server in the cloud, Summoner on your machine): set SUMMONER_MODE=remote on the server, configure SUMMONER_TOKEN on the Summoner, and point it at the server's WebSocket endpoint.

Building an API on Top

The architecture is already structured for this. The Server is the central hub for all requests — adding HTTP API endpoints there, then routing them through the existing WebSocket channel to the Summoner, which writes to CLI stdin, closes the loop. The plumbing is already in place; it just needs an external-facing interface.

That API runs over interactive mode underneath, so it doesn't touch Agent SDK credit. If you need to drive Claude Code programmatically without SDK billing, the architecture has a clear path for it.

Wrapping Up

Interactive mode, full protocol implementation, Split deployment — these three architectural choices give code-quest a distinct position after the June 15 billing change. Subscription usage stays in interactive limits, permission prompts are handled in the browser, and the server runs independently from the local Summoner. If you're looking for a way to operate Claude Code fully from the browser, this project is worth trying.

References

Three Ways to Git Clone with a Different SSH Key

Recca Tsai — Sat, 16 May 2026 02:33:38 +0000

Originally published at recca0120.github.io

You have access to a GitHub repo, but the SSH key isn't the default ~/.ssh/id_rsa or ~/.ssh/id_ed25519. Running git clone directly gives you Repository not found. This post covers three ways to specify which SSH key to use, including the common pitfall with ~/.ssh/config Host aliases.

The Problem

I needed to clone a private repo:

git clone git@github.com:client-org/project.git

The SSH key required for this repo is ~/.ssh/client_key, not the system default. Cloning without specifying it returns:

ERROR: Repository not found.
fatal: Could not read from remote repository.

The error message is misleading — the repo exists. GitHub returned "not found" because the SSH key it received doesn't have access. GitHub intentionally hides whether a repo exists at all when the key has no permission.

I tried setting a Host alias in ~/.ssh/config first, but clone still failed with the same error. The two methods below are what actually worked.

Method 1: GIT_SSH_COMMAND (One-Off)

The fastest approach — prefix the command with an environment variable:

GIT_SSH_COMMAND='ssh -i ~/.ssh/client_key' git clone git@github.com:client-org/project.git

GIT_SSH_COMMAND tells Git to use the SSH command you specify. -i ~/.ssh/client_key points to the private key. This only applies to that one command — subsequent push/pull won't use it automatically.

Method 2: git config core.sshCommand (Single Repo)

After cloning, set it in the repo:

cd project
git config core.sshCommand 'ssh -i ~/.ssh/client_key'

This writes to .git/config and applies to all future git operations (pull, push, fetch) inside that repo.

Do both in one shot:

GIT_SSH_COMMAND='ssh -i ~/.ssh/client_key' git clone git@github.com:client-org/project.git
cd project
git config core.sshCommand 'ssh -i ~/.ssh/client_key'

This is the simplest approach for a single repo.

Method 3: ~/.ssh/config Host Alias (With a Catch)

The textbook approach is to set a Host alias in ~/.ssh/config:

Host github-client
    HostName github.com
    User git
    IdentityFile ~/.ssh/client_key
    IdentitiesOnly yes

Then clone with the alias:

git clone git@github-client:client-org/project.git

The Catch: Existing Host github.com Overrides the Alias

If your ~/.ssh/config already has a Host github.com block (e.g., for your personal account), SSH may ignore the alias and use the wrong key — clone fails with the same error.

Verify the alias is working before cloning:

ssh -T git@github-client

A successful response is Hi <username>! with the correct account. If the wrong account shows up, the alias isn't routing correctly.

Add url.insteadOf to Route an Entire Org

If you have multiple repos under the same org that all need the same key, use a URL rewrite rule:

git config --global url."git@github-client:client-org/".insteadOf "git@github.com:client-org/"

After this, any git operation targeting client-org repos automatically rewrites the URL to use the github-client alias — no need to set core.sshCommand on each repo individually.

For a single repo, core.sshCommand from Method 2 is more direct and doesn't touch global config.

Comparison

Method	Best For	Scope
`GIT_SSH_COMMAND`	One-time clone	Single command
`git config core.sshCommand`	Single repo, ongoing use	Single repo
`~/.ssh/config` + `url.insteadOf`	Entire org, multiple repos	Global

References

Free Claude Code: Route Claude Code API Calls to Free Alternatives

Recca Tsai — Fri, 15 May 2026 14:09:35 +0000

Originally published at recca0120.github.io

Claude Code's developer experience is excellent, but the API costs add up fast. free-claude-code is an open-source proxy that lets you keep using Claude Code's CLI, VS Code extension, and JetBrains integration while routing the underlying API calls to free-tier cloud APIs or self-hosted local models.

How It Works

Every Claude Code operation goes through the Anthropic API. This proxy sits in between:

Claude Code CLI / VS Code / JetBrains
           ↓
    free-claude-code proxy
           ↓
  NVIDIA NIM / OpenRouter / Ollama / ...

The proxy exposes Anthropic-compatible endpoints (/v1/messages, /v1/models, etc.), translates incoming requests to each provider's format, then translates the responses back to Anthropic's format. From the Claude Code client's perspective, it's just a regular Anthropic API.

Supported Providers

Ten backends are currently supported:

Provider	Notes
NVIDIA NIM	Free tier at build.nvidia.com; includes Kimi K2.5, GLM 4.7
OpenRouter	Aggregates many models; some with free tiers
DeepSeek	deepseek-chat, much cheaper than Opus
Kimi	Moonshot's platform.moonshot.ai
Wafer	wafer.ai; DeepSeek-V4-Pro, GLM-5.1
Z.ai	GLM-5.1, GLM-5-turbo
OpenCode Zen	opencode.ai; includes deepseek-v4-flash-free
LM Studio	Local server, default localhost:1234
llama.cpp	Local server, default localhost:8080
Ollama	Containerized local models, default localhost:11434

Per-Tier Model Routing

Claude Code splits requests into three tiers: Opus (main agent), Sonnet, and Haiku (sub-agents). The proxy lets you route each tier to a different model:

MODEL_OPUS=openrouter/qwen/qwen3-235b-a22b:free
MODEL_SONNET=deepseek/deepseek-chat
MODEL_HAIKU=ollama/llama3.1

Opus requests (typically the most expensive) can be routed to a free model; Haiku requests can run locally.

Installation and Setup

Prerequisites: Claude Code CLI and Python uv.

# Install the proxy
uv tool install --force git+https://github.com/Alishahryar1/free-claude-code.git

# Start the proxy server
fcc-server

After starting, open the displayed localhost address in your browser to access the Admin UI and configure provider API keys.

Then use fcc-claude instead of the regular claude command — the launcher automatically injects the required environment variables.

Client Integration

VS Code

Add to settings.json:

{
  "claude.env": {
    "ANTHROPIC_BASE_URL": "http://localhost:8082",
    "ANTHROPIC_AUTH_TOKEN": "freecc",
    "CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY": "1"
  }
}

JetBrains

Edit the ACP configuration file (path varies by platform) with the same three environment variables.

Once configured, the IDE's model picker also works — the proxy's /v1/models endpoint exposes all available models for visual selection.

Optional Features

Discord / Telegram bots: Wrap Claude Code sessions in a bot for remote task management, streaming progress, and conversation branches. Requires bot tokens and channel IDs.

Voice transcription: Connect Whisper or NVIDIA NIM for voice-to-text input via the messaging platforms.

Actual Limitations

A few real constraints to keep in mind:

Model capability gap: Many of Claude Code's strengths — long context, accurate tool calls, complex reasoning — are specific to Claude models. Switching to alternatives may degrade agentic reliability, especially tool call accuracy, which drives most of Claude Code's workflow.

Free tier rate limits: NVIDIA NIM and OpenRouter free models typically have RPM/TPD caps. Heavy usage will hit rate limits.

Local model resource requirements: Running llama.cpp or Ollama needs sufficient VRAM/RAM. Performance is noticeably slower than cloud APIs.

When It Makes Sense

Trying out Claude Code without committing to Anthropic API costs
Mostly doing simple tasks (file edits, formatting, small features) that don't need top-tier models
You have a GPU and prefer paying with electricity instead of API fees
Comparing how different models perform under the Claude Code interface

If your work depends on Claude's long-context handling or complex agentic tasks, swapping models will likely cause tool call failures or reasoning errors. In that case, analyzing your usage structure and optimizing cache usage may be more practical than switching models — see the }}">earlier posts in this series.

References

36 Days of Claude Code Logs: Silent Model Switching, 11.5x Efficiency Gap

Recca Tsai — Sat, 09 May 2026 08:42:56 +0000

Originally published at recca0120.github.io

}}">The first post scanned 95 days of logs and found sub-agent cache TTL silently dropped to 5m. }}">The second tracked it to 17 consecutive days of 100% 5m — conclusion: it's the new default.

This time I broke down the model dimension. Scanning March through May 7, I originally wanted to confirm whether the cache TTL had reverted (it hasn't). Instead I found something bigger: the server doesn't just control cache TTL — it silently switched the main agent model three times.

Data Source

Same as before: ~/.claude/projects/{project-path}/{session-uuid}.jsonl. This time I also checked the message.model field in API responses:

{
  "message": {
    "model": "claude-opus-4-6",
    "usage": {
      "input_tokens": 142,
      "cache_read_input_tokens": 892041,
      "output_tokens": 3847,
      "cache_creation": {
        "ephemeral_5m_input_tokens": 0,
        "ephemeral_1h_input_tokens": 8234
      }
    }
  }
}

The model field comes from the server, not the client. Whatever the API says it used, that's what it used.

Only Four Models

Scanning all JSONL files, only four models appeared:

Model ID	Short	Input	Cache Read	Cache Write 5m	Cache Write 1h	Output
opus-4-6	O4.6	$15/MTok	$1.50	$18.75	$30	$75
opus-4-7	O4.7	$15/MTok	$1.50	$18.75	$30	$75
sonnet-4-6	S4.6	$3/MTok	$0.30	$3.75	$6	$15
haiku-4-5	H4.5	$0.80/MTok	$0.08	$1.00	$1.60	$4

Cache read costs differ 5x between Opus and Sonnet ($1.50 vs $0.30), output 5x ($75 vs $15). Since cache reads dominate Claude Code API calls, model choice directly determines cost magnitude.

Main Agent Silently Switched Three Times

Using cc-office (my primary project) as an example, the main agent model timeline:

Date         O4.6     O4.7     S4.6     Total  Dominant
───────────────────────────────────────────────────────
2026-04-07   3,707        0        0    3,707  O4.6 100%
2026-04-13   2,821        0        0    2,821  O4.6 100%
2026-04-14   3,385        0      315    3,704  O4.6 91%   ← S4.6 appears
2026-04-15       0        0    3,445    3,449  S4.6 100%  ← First switch
2026-04-16       0        0    5,949    5,949  S4.6 100%
2026-04-17       0    1,855    3,621    5,476  S4.6 66%
2026-04-18       0    1,973        0    1,973  O4.7 100%  ← Second switch
2026-04-25     211    5,386        0    5,597  O4.7 96%
2026-04-26   2,308        0        0    2,308  O4.6 100%  ← Back to O4.6
2026-04-29   2,149        0        0    2,149  O4.6 100%
2026-04-30     514        0    1,213    1,727  S4.6 70%   ← Third switch
2026-05-01       0        0    3,492    3,492  S4.6 100%
2026-05-05     350        0    3,187    3,537  S4.6 90%
2026-05-06   2,347        0        0    2,347  O4.6 100%  ← Back again
2026-05-07   4,197       44        0    4,241  O4.6 99%

I had opus-4-6 1m context selected the entire time. But the server returned a different model three times:

4/15-4/17: Downgraded to sonnet-4-6 (3 days)
4/18-4/25: Switched to opus-4-7 (8 days)
4/30-5/5: Downgraded to sonnet-4-6 again (6 days)

Each switch was binary — 100% one model the day before, 100% another the next day. Same pattern as the cache TTL regression: sharp switch, no announcement, client unaware.

Sub-Agent Models Are Not Your Choice

Sub-agent models are decided by Claude Code autonomously, not by user settings. The distribution varies dramatically:

Period	Main	Sub O4.6	Sub O4.7	Sub S4.6	Sub H4.5
3/26-4/14	O4.6	76%	—	4%	19%
4/15-4/17	S4.6	0%	—	7%	92%
4/18-4/25	O4.7	—	27%	1%	73%
4/26-4/30	O4.6	34%	—	28%	37%
5/01-5/05	S4.6	—	—	64%	36%
5/06-5/07	O4.6	47%	—	30%	23%

When main uses Opus, sub-agents tend to also use Opus (76%). When main is downgraded to Sonnet, sub-agents switch to mostly Haiku (92%). This correlation isn't coincidental — the server adjusts sub-agent model allocation alongside main model changes.

How to Measure Efficiency

The previous posts focused on cache TTL. This time: how much did you spend for how much main agent output.

Why main output:

Main agent output is what you're paying for — code, edits, answers
Sub-agents are overhead — their job is to search and gather for the main agent
Sub-agent output feeds into main agent input, not into your deliverables

Core metric:

Total cost (main + sub) per million main output tokens

Lower = more efficient.

Efficiency Rankings Across Seven Periods

Segmented by dominant main agent model:

Rank	Period	Main	S/M ratio	$/M main output	$/day
1 ⚡	5/01-5/05	S4.6	0.91	$167	$319
2	4/15-4/17	S4.6	0.30	$218	$583
3	3/09-3/21	S4.6	2.04	$875	$144
4	4/26-4/30	O4.6	0.47	$896	$1,450
5	4/18-4/25	O4.7	0.23	$1,134	$3,148
6	5/06-5/07	O4.6	0.35	$1,554	$2,836
7 🐌	3/26-4/14	O4.6	0.55	$1,925	$2,137

11.5x gap between the most and least efficient periods.

Most Efficient: Main S4.6 + Sub S4.6/H4.5 (5/01-5/05)

Main:  21,082 calls (4,216/day)  Model: S4.6 98%
Sub:   19,266 calls (3,853/day)  Model: S4.6 64%, H4.5 36%
Total: $1,596 ($319/day)
Main output: 9,559,468 (1,911,894/day)
$/M main output: $167

S/M ratio of 0.91 looks high — nearly one sub call per main call. But subs only use Sonnet and Haiku, so overhead is just $56/M main output. Cheap sub calls don't hurt even when frequent.

Best bang for buck: 5,991 main output tokens per dollar.

Least Efficient: Main O4.6 + Sub 76% O4.6 (3/26-4/14)

Main:  41,086 calls (2,162/day)  Model: O4.6 99%
Sub:   22,460 calls (1,182/day)  Model: O4.6 76%, S4.6 4%, H4.5 19%
Total: $40,594 ($2,137/day)
Main output: 21,092,340 (1,110,123/day)
$/M main output: $1,925

S/M ratio is only 0.55 — looks disciplined. But sub-agents used 76% Opus, meaning every sub call pays Opus-rate cache reads. Sub overhead hits $477/M main output.

Main output per day was only 1.11M — the lowest across all periods. Most money spent, least produced.

Most Expensive but Not Most Efficient: Opus 4.7 (4/18-4/25)

Main:  31,204 calls (3,900/day)  Model: O4.7 99%
Sub:    7,233 calls (904/day)    Model: O4.7 27%, H4.5 73%
Total: $25,187 ($3,148/day)
Main output: 22,219,870 (2,777,484/day)
$/M main output: $1,134

Highest daily output (2.77M tokens), lowest S/M ratio (0.23), sub overhead only $15/M. Looks lean, but $3,148/day is steep — the Sonnet period (4/15-4/17) produced 2.67M/day for just $583.

Sub-Agent Overhead Rankings

Sub cost divided by main output — pure overhead measurement:

Rank	Period	S/M ratio	Sub Composition	Sub $/M main output
1 ✅	4/15-4/17	0.30	H4.5 92%	$6
2	4/18-4/25	0.23	H4.5 73%, O4.7 27%	$15
3	5/01-5/05	0.91	S4.6 64%, H4.5 36%	$56
6	3/26-4/14	0.55	O4.6 76%	$477
7 ❌	3/09-3/21	2.04	O4.6 20%, S4.6 71%	$695

Two patterns:

Sub-agents should use Haiku. Period 4/15-4/17 with 92% Haiku had $6/M overhead — 1/80th of using Opus
High S/M ratio isn't inherently bad. Period 5/01-5/05 had 0.91 ratio but cheap models, so overhead was only $56. Period 3/26-4/14 had 0.55 ratio but 76% Opus, pushing overhead to $477

S/M ratio isn't the problem. What model the sub uses is the problem.

Side-by-Side Comparison

Metric	Best S4.6 (5/01-5/05)	O4.6 (3/26-4/14)	O4.7 (4/18-4/25)
Daily cost	$319	$2,137	$3,148
Daily main output	1,911,894	1,110,123	2,777,484
$/M main output	$167	$1,925	$1,134
Tokens per dollar	5,991	520	882
Sub overhead/M	$56	$477	$15
Sub primary model	S4.6+H4.5	O4.6 76%	H4.5 73%

What You Can't Control

All analysis in this post comes with a caveat: model choice isn't fully in your hands.

What you control:

Selecting a model in Claude Code settings (I selected opus-4-6 1m)

What you don't control:

Server may silently swap your main agent model
Sub-agent models are assigned by Claude Code autonomously
Cache TTL is server-decided (sub-agent stuck at 100% 5m for 29 consecutive days)

The "Using Opus 4.6" label in Claude Code may not reflect reality. Scanning JSONL for the API response model field is the only reliable way to verify.

Cache TTL Status: Still 100% 5m

Updating the cache TTL situation. Scanning 4/30-5/7:

Metric	Main Agent	Sub Agent
Total API calls	37,366	26,160
1h cache write	100%	0
5m cache write	0	100%

Since the }}">first post's 4/9 mark, sub-agents have been at 100% 5m for 29 consecutive days, zero 1h writes. No sign of reverting.

How to Scan Your Own Data

Building on the Python from previous posts, now with model breakdown:

#!/usr/bin/env python3
import json
from pathlib import Path
from collections import defaultdict

ROOT = Path.home() / ".claude/projects"
data = defaultdict(lambda: defaultdict(lambda: defaultdict(
    lambda: {"calls":0, "input":0, "cache_read":0, "output":0}
)))

for jsonl in ROOT.rglob("*.jsonl"):
    agent = "sub" if "subagent" in str(jsonl) else "main"
    try:
        for line in jsonl.open():
            try: obj = json.loads(line)
            except: continue
            msg = obj.get("message", {})
            if not isinstance(msg, dict): continue
            u = msg.get("usage") or {}
            inp = u.get("input_tokens", 0)
            cr = u.get("cache_read_input_tokens", 0)
            out = u.get("output_tokens", 0)
            if not (inp or cr or out): continue
            day = (obj.get("timestamp") or "")[:10]
            model = (msg.get("model") or "unknown").replace("claude-", "")
            r = data[day][agent][model]
            r["calls"] += 1
            r["cache_read"] += cr
            r["output"] += out
    except: pass

for day in sorted(data):
    if day < "2026-03-01": continue
    for agent in ["main", "sub"]:
        models = data[day][agent]
        if not models: continue
        parts = [f"{m}={v['calls']}" for m, v in
                 sorted(models.items(), key=lambda x: -x[1]["calls"])]
        print(f"{day}  {agent:4}  {', '.join(parts)}")

Run it to see what model your main agent actually used — and whether it matches what you selected.

Conclusions

Model choice is the biggest cost factor. Cache TTL affects cost ~2x; model affects 5-11x. The cache read price gap between Opus and Sonnet (5x) translates to thousands of dollars per day
The server silently switches models. I selected opus-4-6, but across 36 days, 17 were switched to sonnet-4-6 or opus-4-7. Same pattern as the cache TTL regression — no announcement
Sub-agents using Opus is the biggest waste. Sub-agent work is search and exploration; Haiku is sufficient. Sub overhead with 76% Opus is 80x higher than with 92% Haiku
High S/M ratio isn't inherently bad. What model the sub uses matters more than how many times it runs. Many cheap sub calls beat one expensive Opus sub call
The most efficient combination ($167/M main output) and the least efficient ($1,925/M) differ by 11.5x — same user, same project, same type of work

What model you select in Claude Code doesn't matter — what matters is what the server actually gives you. Scanning your own JSONL is the only reliable method.

References

JetBrains Air: An Agentic IDE That Runs Multiple AI Agents in Parallel

Recca Tsai — Wed, 06 May 2026 03:23:13 +0000

Originally published at recca0120.github.io

You have a bug to fix, tests to write, and a module to refactor. The old way is to do them one at a time, or juggle multiple terminals yourself. JetBrains Air lets you delegate each task to a different AI agent and run them all simultaneously without interference.

Air isn't an AI panel bolted onto an existing IDE. It's a new development environment designed from the ground up for delegating tasks to agents.

Supported Agents

Air ships with four agents:

Agent	Provider	Strength
Claude Agent	Anthropic	Long-context understanding, code generation
OpenAI Codex	OpenAI	Code-specialized model
Gemini CLI	Google	Configurable thinking mode for reasoning depth
Junie	JetBrains	JetBrains' own agent, included with AI subscription

Once a task starts, you can switch models but not providers. A Claude task can move from Sonnet to Opus, but can't switch to Codex mid-task.

Air also supports the Agent Client Protocol (ACP), so more third-party agents can plug in over time.

Three Execution Environments

Each task runs in one of three environments:

Local Workspace

Runs directly in your working directory. Fastest startup, zero configuration, but agent changes hit your files directly.

Best for: quick iterations where isolation isn't needed.

Git Worktree

Creates a separate working copy of the same repo, isolating changes to another branch.

Configuration lives in .air/worktree.json:

{
  "environment": [
    {"type": "env", "key": "NODE_ENV", "value": "test"},
    {"type": "envFile", "path": "~/.env.test"}
  ],
  "setup": {
    "macos": ["npm install"]
  }
}

Best for: running multiple tasks that touch the same files without conflicts.

Docker

Runs inside a container with complete isolation. Requires Docker Desktop.

Configuration lives in .air/docker.json:

{
  "image": "ubuntu:24.04",
  "environment": [
    {"type": "env", "key": "TEST", "value": "12345"}
  ],
  "setup": ["apt-get update", "apt-get install -y jq"],
  "user": {"user": "node", "group": "node"}
}

Custom images must include Git and /bin/sh.

Best for: full isolation where nothing should touch the host system.

Comparison

	Local	Worktree	Docker
Startup speed	Fastest	Fast	Slower
Isolation level	None	File/branch	Complete
Configuration needed	No	Optional	Optional
Depends on host env	Yes	Yes	No

Task Lifecycle

A task moves through these stages:

Define: press ⌘+\ to create a new task, describe what you want in chat
Add context: attach files, folders, Git commits, symbols, images, MCP servers
Choose permission mode:
- Ask Permission: prompts before every file edit or command
- Auto-Edit: automatically accepts file changes
- Plan: analyzes without making changes
- Full Access: no prompts at all
Execute: the agent starts working
Input required: the agent pauses and notifies you when it needs a decision
Done: review changes, commit, push

Use ⌘+1 to see all task states and switch between them.

Parallel Multitasking

This is Air's core value proposition. You can run multiple tasks simultaneously:

One agent fixing a bug
Another writing tests
A third refactoring a module

Each task runs independently with its own agent session. When a task needs your attention, you get a notification. Handle it and switch back.

If multiple tasks touch the same files, use Git Worktree or Docker to prevent conflicts.

MCP Server Integration

Air supports Model Context Protocol for extending agent capabilities:

Open settings with ⌘+,
Navigate to AI | MCP Servers
Click Add Global MCP Server
Paste the server configuration in JSON format

This lets agents access databases, call APIs, or interact with external systems.

Web Preview

For web projects, Air includes a built-in preview. The dev server launches automatically and shows a preview window. You can switch between preview and source modes, with responsive sizing built in.

Pricing and Availability

Platform: macOS only for now; Windows and Linux planned for 2026
Cost: Air itself is free to download
AI billing:
- JetBrains AI Pro subscribers get all agents included
- You can bring your own API keys (Anthropic, OpenAI, Google)
- If both are configured, your own keys are used first

Real-World Experience (From a JetBrains Developer)

Valerii Tepliakov, a developer on the Air team, shared his adoption journey:

Started skeptical — cared too much about code quality to trust AI output
Began with small, reversible tasks alongside IntelliJ IDEA
Gradually expanded scope, delegating multiple concurrent tasks
Eventually Air became his primary tool; IntelliJ reserved for debugging only

His key insight: agents struggle with architecture and design decisions. Make the foundational choices yourself, then delegate implementation details.

How Air Differs from Claude Code / Cursor

Air occupies a different niche:

Claude Code: single agent in a terminal, one task at a time
Cursor: AI embedded in an IDE, augmenting your editing experience
Air: multi-agent task manager where you're the manager and agents are workers

Air doesn't replace your IDE. It handles agent workflow; you keep using IntelliJ / VS Code for fine-grained editing and debugging.

When Air Makes Sense

Your project is large enough to have multiple independent tasks running in parallel
You'd rather review code than write it yourself
You want to compare agents (e.g., run the same task with Claude and Codex to see which produces better results)
Your team already decomposes work into well-defined tasks

If your workflow is "focus on one thing at a time, think while you code," Air's multi-task architecture may not be what you need.

References

Find Dead Code with Knip: The Blind Spots ESLint and depcheck Miss

Recca Tsai — Sat, 02 May 2026 08:38:34 +0000

Originally published at recca0120.github.io

The project is two years old. There are 80 entries in package.json and you can't say with confidence which ones are still being used. A utils.ts file hasn't been touched in three months — you're not sure if anyone imports it. shared/helpers.ts exports a dozen functions, some of which were replaced by newer approaches months ago, but nobody deleted the old ones.

Dead code doesn't accumulate overnight. A little left over from each refactor, one forgotten dependency from each package swap. Over time the project gets heavier, but no tool ever tells you exactly what's dead.

Knip is built for exactly this. One command finds everything you thought was being used but isn't.

The Blind Spots in ESLint and depcheck

Most teams reach for ESLint and depcheck to handle this kind of thing. Both have clear limits.

ESLint only sees a single file. It can tell you "there's a const x in this function that's never used," but if an entire utils.ts is never imported anywhere, ESLint won't notice. Its view stops at the file boundary — it can't trace cross-file reference chains.

depcheck only looks at package.json. It scans for require and import statements to see which packages are actually referenced, then flags anything that's installed but unused. But it doesn't understand TypeScript export/import semantics, and it has no concept of which files are never referenced at all.

Together they still leave a gap: cross-file export usage — nothing tracks it.

Knip works differently. Starting from configured entry points, it builds a complete module graph tracing every import, export, and dependency. Anything not connected to the graph is dead code.

What It Finds

A single knip run finds:

Unused npm dependencies: installed but never imported anywhere
Unlisted dependencies: imported in code but missing from package.json (implicit reliance on a transitive dep)
Unused exports: exported but never imported anywhere
Unused files: entire files never referenced from anywhere
Unresolved imports: imports pointing to paths or modules that don't exist

These problems used to require several tools stitched together, with gaps remaining. Knip covers all of them.

Vercel used Knip to delete nearly 300,000 lines of code. That's their own number, not marketing copy.

Installation and Usage

No installation required — just run it:

npx knip

Or add it to devDependencies:

npm install -D knip
npx knip

Knip ships with around 150 plugins covering Vite, Vitest, Next.js, Astro, ESLint, GitHub Actions, and more. In most projects, zero configuration is needed — it auto-detects the tooling in use.

Reading the Output

After a run, the output looks something like this:

Unused files (2)
src/legacy/old-helper.ts
src/utils/deprecated.ts

Unused dependencies (3)
lodash
moment
@types/node  (devDependencies)

Unused exports (5)
src/shared/helpers.ts: formatDate, parseQuery
src/utils/string.ts: capitalize, truncate, slugify

Each category is clear: which files are entirely unreferenced, which packages can be removed, which exports have no importers.

The first run usually surfaces a lot. Don't try to fix everything at once. Start with unused dependencies (easy wins), then work through unused exports, and finally tackle entire unused files.

Configuration

If you need to customize, add a knip.json at the project root (or a knip key in package.json):

{
  "entry": ["src/index.ts", "src/pages/**/*.tsx"],
  "project": ["src/**/*.{ts,tsx}"],
  "ignore": ["src/legacy/**", "**/*.stories.ts"],
  "ignoreDependencies": ["some-cli-tool"]
}

entry: where to start tracing references from
project: which files belong to this project
ignore: paths to skip
ignoreDependencies: dependencies to keep even if they appear unused (e.g. CLI tools)

Auto-Fix

Some issues can be fixed automatically with --fix:

npx knip --fix

Currently --fix handles removing unused entries from package.json and some unused exports. Not everything is auto-fixable, but it saves a lot of manual work on the dependency side.

VSCode Extension and MCP

Knip has a VSCode extension that shows unused exports directly in the editor — no need to run the CLI to find out.

There's also @knip/mcp, which lets AI assistants call Knip when analyzing a project, helping them understand which code is actually in use.

Dead Code Is Technical Debt

Removing unused code isn't just about making the project smaller. Every unused export is a cognitive burden — a new developer doesn't know if that function matters and has to spend time tracing it. Every unused dependency is a potential security risk and an update to manage.

Knip turns "find dead code" from a manual chore into an automated step. Run it once to clean house, then wire it into CI so dead code can't quietly accumulate again.

References

vitest-fail-on-console: Stop Ignoring console.error in Your Tests

Recca Tsai — Sat, 02 May 2026 00:51:28 +0000

Originally published at recca0120.github.io

All tests pass, but the terminal is full of red console.error output. This is common and easy to ignore — the tests passed, after all. But those errors don't appear out of nowhere. Something went wrong; nobody just noticed.

vitest-fail-on-console does one thing: if console.error or console.warn appears during a test, that test fails. It forces you to acknowledge these messages instead of letting them drown in noise.

Why console.error in Tests Is a Code Smell

Vitest doesn't care about console output by default. You can console.error all day and tests still pass.

The problem is that console.error usually means something. It might be:

A React prop type warning
An async error that was caught but not properly handled
A third-party package telling you you're using it wrong
An error handler in your own code getting triggered

When these appear in tests, the test is running in a slightly broken state — it just didn't throw. Over time the test output becomes pure noise. Nobody reads it anymore, and the real signals get buried.

vitest-fail-on-console flips this: make console output a test failure, so you're forced to address it.

Installation

npm install -D vitest-fail-on-console

Setup

Import and call it in your setup file:

// tests/setup.ts
import failOnConsole from 'vitest-fail-on-console'

failOnConsole()

Then wire up the setup file in vitest.config.ts:

import { defineConfig } from 'vitest/config'

export default defineConfig({
  test: {
    setupFiles: ['tests/setup.ts'],
  },
})

That's it. Any test that triggers console.error or console.warn will now fail.

Options

failOnConsole() accepts an options object to control which console methods trigger failures:

failOnConsole({
  shouldFailOnError: true,   // default true
  shouldFailOnWarn: true,    // default true
  shouldFailOnLog: false,    // default false
  shouldFailOnInfo: false,   // default false
  shouldFailOnDebug: false,  // default false
  shouldFailOnAssert: false, // default false
})

error and warn are usually enough. Whether to include log / info / debug depends on your project's conventions.

allowMessage

Allow specific messages through without failing — useful for known third-party issues you can't fix right now:

failOnConsole({
  allowMessage: (message) => {
    return /ResizeObserver loop limit exceeded/.test(message)
  },
})

silenceMessage

Like allowMessage, but also suppresses the console output entirely:

failOnConsole({
  silenceMessage: (message) => {
    return /Not implemented: navigation/.test(message)
  },
})

skipTest

Skip specific test files or test names entirely:

failOnConsole({
  skipTest: ({ testPath, testName }) => {
    return testPath.includes('/legacy/')
  },
})

afterEachDelay

Sometimes async operations call console methods after a test ends. This option adds a delay before checking:

failOnConsole({
  afterEachDelay: 100, // wait 100ms, default is 0
})

Handling Expected console.error Calls

After installing vitest-fail-on-console, if a test is specifically verifying that console.error gets called, letting it fire naturally will cause the test to fail.

The correct approach is to mock it with vi.spyOn:

it('logs an error when request fails', () => {
  // mock it so the message doesn't actually reach the console
  vi.spyOn(console, 'error').mockImplementation(() => {})

  triggerSomethingThatLogsError()

  // assert it was called with the expected message
  expect(console.error).toHaveBeenCalledWith('Request failed')
})

This does two things: the test explicitly declares "I know an error will be logged here," and it asserts the exact message. Much stricter than silently letting console.error through.

Pair It with a Clean Test Environment

vitest-fail-on-console handles the console output side. If your tests also have I/O boundaries to replace — filesystem, file watchers — you can pair it with memfs using the same philosophy: every aspect of the test environment should be under your control.

See }}">Testing a Filesystem Service with memfs + FakeWatchService for that approach.

References

Testing a Filesystem Service with memfs + FakeWatchService: No Disk Required

Recca Tsai — Tue, 28 Apr 2026 05:58:11 +0000

Originally published at recca0120.github.io

How do you test a Node.js service that operates on the filesystem? The obvious approach is to create real files under /tmp, run the tests, then clean up. But that comes with problems: slow I/O, inconsistent cross-platform behavior, CI permission differences, and file watcher events whose timing you can't control.

This post dissects a real FileService test suite to show how memfs and a hand-written FakeWatchService solve all of these.

What the Service Looks Like

FileService implements a IFileService interface. It handles directory browsing, file listing (with fuzzy search), file reading/writing, and CRUD operations. It takes two optional external dependencies via constructor injection:

export class FileService implements IFileService {
  constructor(
    private readonly roots: readonly string[],
    private readonly watch?: WatchService,
    private readonly fsImpl?: typeof import('node:fs'),
  ) {}
}

roots: allowed root directories
watch: optional WatchService for cache invalidation on file changes
fsImpl: optional fs module implementation, passed to glob

All three are injected through the constructor — real implementations in production, fakes in tests. This is the foundation of the entire testing strategy.

Replacing the Real Filesystem with memfs

The first step is swapping out node:fs and node:fs/promises entirely:

import { vol, fs as memfs } from 'memfs';
import { FileService } from './file-service';

vi.mock('node:fs', async () => (await import('memfs')).fs);
vi.mock('node:fs/promises', async () => (await import('memfs')).fs.promises);

memfs is a fully in-memory fs implementation. Its API is identical to Node.js's native fs, but everything happens in memory — no disk touched.

The dynamic import inside vi.mock factory functions is a Vitest requirement, but actual usage of vol and memfs is through top-level imports.

Before each test, vol.fromJSON() declaratively creates the needed file structure:

const ROOT = '/test-root';
let service: FileService;

beforeEach(() => {
  vol.fromJSON({
    [join(ROOT, 'alpha/.keep')]: '',
    [join(ROOT, 'beta/nested/.keep')]: '',
    [join(ROOT, '.hidden/.keep')]: '',
    [join(ROOT, 'node_modules/.keep')]: '',
    [join(ROOT, '.git/.keep')]: '',
    [join(ROOT, 'src/index.ts')]: 'export {}',
    [join(ROOT, 'src/utils.ts')]: 'export const x = 1',
    [join(ROOT, 'package.json')]: '{}',
  });
  service = new FileService([ROOT], undefined, memfs);
});

afterEach(() => vol.reset());

Benefits:

Speed: in-memory operations, no disk I/O
Isolation: vol.reset() gives you a fresh filesystem — zero cross-test interference
Readability: the JSON shows the entire file structure at a glance
Cross-platform: no worrying about Windows path separators or /tmp permissions

Replacing chokidar with FakeWatchService

FileService has an internal caching layer: the first listFiles() call runs glob to scan the directory tree, caching the result. Subsequent calls return the cache until a WatchService event invalidates it.

In production, chokidar watches for file changes, but chokidar events are asynchronous and non-deterministic. There's no way to precisely control "trigger an event now" in a test.

The solution is a Fake:

export class FakeWatchService implements WatchService {
  private subs = new Map<string, Set<WatchCallback>>();

  subscribe(cwd: string, cb: WatchCallback): Unsubscribe {
    let set = this.subs.get(cwd);
    if (!set) {
      set = new Set();
      this.subs.set(cwd, set);
    }
    set.add(cb);
    let active = true;
    return () => {
      if (!active) return;
      active = false;
      const s = this.subs.get(cwd);
      if (!s) return;
      s.delete(cb);
      if (s.size === 0) this.subs.delete(cwd);
    };
  }

  simulate(cwd: string, event: WatchEvent): void {
    const set = this.subs.get(cwd);
    if (!set) return;
    for (const cb of set) {
      try { cb(event); } catch (err) {
        console.error('[FakeWatchService] subscriber threw:', err);
      }
    }
  }
}

This isn't a mock — it's a Fake with real behavior. It genuinely manages subscribers, genuinely executes unsubscriptions, and genuinely dispatches events to all callbacks. The only difference is the event source: instead of OS-level inotify/FSEvents, events come from simulate() calls in test code.

For more on the difference between Fakes and Mocks, see }}">DI + Fake + in-memory: Writing Maintainable Frontend Tests.

Three Cache Invalidation Scenarios

With FakeWatchService, cache behavior becomes precisely verifiable.

Scenario 1: No event → cache hit

it('second call without watcher event reuses cached file list', async () => {
  const watch = new FakeWatchService();
  const cached = new FileService([ROOT], watch, memfs);
  const a = await cached.listFiles(ROOT, '');
  vol.writeFileSync(join(ROOT, 'after-cache.ts'), '');
  const b = await cached.listFiles(ROOT, '');
  expect(b.some((f) => f.name === 'after-cache.ts')).toBe(false);
  expect(b.length).toBe(a.length);
});

A file was added via vol.writeFileSync before the second call, but no watch event was fired, so the cache stays valid. The new file doesn't appear.

Scenario 2: Event fired → cache invalidated

it('watcher event invalidates cache so next call rebuilds', async () => {
  const watch = new FakeWatchService();
  const cached = new FileService([ROOT], watch, memfs);
  await cached.listFiles(ROOT, '');
  vol.writeFileSync(join(ROOT, 'after-invalidate.ts'), '');
  watch.simulate(ROOT, { type: 'add', path: 'after-invalidate.ts' });
  const b = await cached.listFiles(ROOT, '');
  expect(b.some((f) => f.name === 'after-invalidate.ts')).toBe(true);
});

After watch.simulate(), the cache is cleared and the next listFiles() rescans, picking up the new file.

Scenario 3: Concurrent first calls subscribe only once

it('concurrent first calls do not subscribe duplicate watchers', async () => {
  const watch = new FakeWatchService();
  let subscribeCount = 0;
  const realSubscribe = watch.subscribe.bind(watch);
  watch.subscribe = (cwd, cb) => {
    subscribeCount++;
    return realSubscribe(cwd, cb);
  };
  const cached = new FileService([ROOT], watch, memfs);
  await Promise.all([cached.listFiles(ROOT, ''), cached.listFiles(ROOT, '')]);
  expect(subscribeCount).toBe(1);
});

Two concurrent listFiles() calls should only subscribe once. This verifies the inflight promise deduplication mechanism works correctly.

All three scenarios would be nearly impossible to write reliably with real chokidar — event timing and frequency aren't controllable.

Security Tests Are First-Class Citizens

Security assertions are distributed across every describe block:

// browseDirectories filters hidden directories
it('filters hidden directories', async () => {
  const names = (await service.browseDirectories(ROOT))
    .map((d: DirEntry) => d.name);
  expect(names).not.toContain('.hidden');
  expect(names).not.toContain('.git');
});

// readFile blocks path traversal
it('rejects path traversal', async () => {
  expect(await service.readFile(ROOT, '../../etc/passwd')).toEqual({
    error: 'Path traversal not allowed',
  });
});

// All mutations reject paths outside allowed roots
it('all mutations reject paths outside allowed roots', async () => {
  expect(await svc.create('/etc/passwd-clone', 'file'))
    .toMatchObject({ error: expect.any(String) });
  expect(await svc.delete('/etc/passwd'))
    .toMatchObject({ error: expect.any(String) });
  // rename, copy, move likewise...
});

These aren't in a separate "security test suite" — they live alongside feature tests. Every entry point has its own security verification.

The isInsideRoot boundary tests are worth noting:

it('returns false for prefix-similar but not actually inside', () => {
  const sibling = `${ROOT}-sibling`;  // /test-root-sibling
  expect(service.isInsideRoot(sibling)).toBe(false);
});

/test-root-sibling has /test-root as a string prefix, but it's not inside the root. The implementation uses path.relative() to handle this correctly, and the test ensures that behavior.

Tests Are Grouped by Behavior

The test file isn't organized as "one describe per method." It's grouped by behavior:

browseDirectories: full browsing behavior including filtering, sorting, security checks
listFiles: three pattern modes (empty, trailing slash, fuzzy) plus a dedicated describe for cache invalidation
readFile: normal reads + path traversal
mutations: isolated MROOT environment, full CRUD + out-of-bounds rejection
isInsideRoot: pure logic boundary testing

Cache invalidation gets its own describe('cache invalidation via WatchService') because it's an independent behavioral concern with its own setup (requires FakeWatchService injection).

How to Reuse This Pattern

The core of this strategy is three things:

memfs replaces fs: applicable to any service using node:fs. Two lines of vi.mock, declarative setup with vol.fromJSON()
Hand-written Fakes replace non-deterministic dependencies: file watchers, WebSockets, event emitters — anything async and event-driven benefits from the Fake + simulate() pattern
Constructor injection makes replacement possible: instead of new Chokidar() inside the service, inject a WatchService interface. Tests are just a beneficiary of this design

If your project has similar I/O boundaries — filesystem, database, external APIs, message queues — the same approach applies: define an interface, inject the dependency, swap in an in-memory Fake for tests.

This approach is even more effective in a monorepo. When Fakes are extracted into shared packages, both frontend and backend can use the same test doubles with guaranteed behavioral consistency. See }}">Monorepo Shared Fakes: One Test Double from Frontend to Backend.

References

How to Add Old Models to Claude Code /model Picker: 3 Methods Tested

Recca Tsai — Sun, 26 Apr 2026 18:17:29 +0000

Originally published at recca0120.github.io

The day Opus 4.7 launched, Claude Code's opus alias silently pointed to the new version. No notification, no changelog reminder. Open the /model picker — Opus 4.6 was gone.

I'd previously written a }}">3-month billing analysis showing 4.7's quota burn is 2.4× that of 4.6. Switching back seemed obvious, but the picker had no option for it. I spent an afternoon testing every configuration method and combing through GitHub issues.

Three Configuration Mechanisms

Claude Code currently offers three ways to modify the /model picker.

1. `availableModels`: Replaces, Doesn't Extend

Add this to ~/.claude/settings.json:

{
  "availableModels": [
    "claude-opus-4-6",
    "claude-sonnet-4-6",
    "claude-haiku-4-5"
  ]
}

The /model picker now shows only these three. The default opus / sonnet / haiku aliases all disappear, replaced entirely by your list.

This is the biggest gotcha: many people assume availableModels means "add these on top of the defaults." It doesn't — it's a complete replacement.

2. `modelOverrides`: For Bedrock / Vertex

{
  "modelOverrides": {
    "claude-opus-4-7": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-opus-4-7-v1:0"
  }
}

This maps model IDs to provider-specific endpoints. If you're using the Anthropic API directly, this setting does nothing for you.

3. `ANTHROPIC_CUSTOM_MODEL_OPTION`: One Extra, That's It

Supported since v2.1.78, this environment variable adds a single custom entry to the bottom of the /model picker:

export ANTHROPIC_CUSTOM_MODEL_OPTION="claude-opus-4-6[1m]"
export ANTHROPIC_CUSTOM_MODEL_OPTION_NAME="Opus 4.6 (1M)"
export ANTHROPIC_CUSTOM_MODEL_OPTION_DESCRIPTION="Opus 4.6 with 1M context window"

It doesn't touch the default picker, but you can only add one. Want both Opus 4.6 and Sonnet 4.6 1M? No luck — there's no ANTHROPIC_CUSTOM_MODEL_OPTION_2.

Pitfalls

Aliases Don't Work in availableModels

You might try:

{
  "availableModels": ["opus", "sonnet", "haiku", "claude-opus-4-6[1m]"]
}

The picker shows 4 items, but opus, sonnet, haiku behave inconsistently when mixed with full model IDs. Aliases aren't valid model IDs in this context — stick to full IDs.

Same-Family Deduplication

If you list both claude-opus-4-6 and claude-opus-4-7 in availableModels, same-family deduplication may collapse them into one entry. This behavior is undocumented.

Lock Version with `model`, Not `availableModels`

availableModels controls what's in the picker. The model field controls what's actually used at startup:

{
  "model": "claude-opus-4-6[1m]",
  "availableModels": ["claude-opus-4-6[1m]", "claude-sonnet-4-6", "claude-haiku-4-5"]
}

Set both. If you only set availableModels without model, startup still uses whatever the default alias points to.

What the GitHub Community Says

This isn't a niche issue. There are plenty of related issues on GitHub:

#14443 — Request for Configurable Model Picker

User joerivwijn asked for the /model picker to be configurable via settings.json. Especially relevant for Bedrock users whose model IDs need us. prefixes and :0 suffixes that don't match the default picker.

Result: Marked as duplicate of #12969 by bot and auto-closed.

#12738 — Opus 4.5 Disappeared from Picker

User grigb reported Opus 4.5 missing from the CLI picker on the Max plan, despite being available in the web app. Multiple users confirmed:

cleanspin found /model opus pointed to Opus 4.1 instead of 4.5 — the alias mapping was stale. Workaround: use the full ID /model claude-opus-4-5-20251101
todddrinkwater reported the VS Code extension also affected
zerzerzerz and PavelProdan posted screenshots confirming "it was there yesterday, gone today"

This pattern repeats with every new model release: the alias points to the new version, the old version vanishes from the picker, no notice given.

Result: Auto-closed by stale bot.

#35630 — ANTHROPIC_CUSTOM_MODEL_OPTION Undocumented

User coygeek noticed v2.1.78's changelog mentioned this env var, but official docs had zero coverage.

Result: Fixed — both the env-vars and model-config doc pages now include full documentation.

Other Related Open Issues

Issue	Problem
#52310	Bedrock ignores `availableModels`, shows only one model per family
#47164	Enterprise custom model IDs can't appear in interactive picker
#40501	Duplicate entries when settings.json model matches a built-in option
#49566	`ANTHROPIC_DEFAULT_*_MODEL` creates duplicate "Custom" entry on Bedrock
#53006	VS Code extension missing Sonnet 4.6
#38238	1M context model not available in picker on WSL2

The issues cluster around two themes: stale alias mappings and no mechanism to extend the default list.

Recommended Configuration

After testing everything, availableModels is unreliable due to same-family deduplication — list 4 models and you might only see 3. The most practical approach: don't touch availableModels, keep the default picker, lock your version with model, and add one extra option with ANTHROPIC_CUSTOM_MODEL_OPTION.

~/.claude/settings.json (global):

{
  "model": "claude-opus-4-6[1m]"
}

One line. The default picker stays intact (opus / sonnet / haiku all present), but every session starts on Opus 4.6 1M. The opus alias in the picker points to the latest version (currently 4.7), so you can switch to it anytime.

~/.zshrc (add a 4th option via ANTHROPIC_CUSTOM_MODEL_OPTION):

export ANTHROPIC_CUSTOM_MODEL_OPTION="claude-sonnet-4-6[1m]"
export ANTHROPIC_CUSTOM_MODEL_OPTION_NAME="Sonnet 4.6 (1M)"
export ANTHROPIC_CUSTOM_MODEL_OPTION_DESCRIPTION="Sonnet 4.6 with 1M context window"

This gives 4 options in /model: the default opus / sonnet / haiku, plus Sonnet 4.6 1M. Covers all daily scenarios:

Opus 4.6 1M: locked via model field, used on startup
Opus 4.7: the opus alias in the picker, switch when needed
Sonnet 4.6: the sonnet alias, for review / fix / test tasks
Sonnet 4.6 1M: the 4th option via env var, for large context scenarios

Why not use availableModels? It's a full replacement, and same-family dedup silently eats entries. Leaving it unset gives you the most stable default picker.

Conclusion

Claude Code's model picker assumes everyone wants the latest version. availableModels is a full replacement with dedup issues; ANTHROPIC_CUSTOM_MODEL_OPTION only supports one entry.

In practice, model + ANTHROPIC_CUSTOM_MODEL_OPTION covers most needs: lock your version with model, add one extra option via env var, and leave the default picker alone.

Plenty of GitHub issues have been filed, but most get marked duplicate or stale-closed. If you need more than one custom model in the picker, there's no official solution yet.

References

Claude Code Model Configuration — Official model settings docs
Claude Code Environment Variables — Environment variable reference
#14443 — Configure custom models in /model picker
#12738 — Opus 4.5 missing from model picker
#35630 — ANTHROPIC_CUSTOM_MODEL_OPTION env var missing from docs
#52310 — Bedrock availableModels ignored

I Audited 3 Months of Claude Code Billing — Most Community Cost-Saving Tips Don''t Work

Recca Tsai — Sun, 26 Apr 2026 00:26:04 +0000

Originally published at recca0120.github.io

This past week, chasing a vague "quota burns faster lately" feeling, I scanned three months of my own Claude Code logs. ~\$127K equivalent cost, 127K turns, four models, hundreds of sessions.

The uncomfortable finding: the cost-saving tips floating around on Reddit / HN / Twitter mostly don't survive real data. "Sessions are too long, run /clear," "too many skills, prune them," "MCP servers should be lean" — these all sound right. But against three months of actual data, almost none holds up. Only two things actually shrink the bill, and neither is about "optimizing your habits."

Earlier I wrote }}">Scanned 95 days of Claude Code logs, found a second cache TTL silent regression and }}">17-day follow-up covering server-side behavior. This post is the extension: with server behavior confirmed unfixable, what's left for the user side.

Three Months of Bills

A single primary development project (one codebase, solo dev), monthly Claude Code totals:

Month	Equiv \$	Dominant Model	Key Event
2026-02	\$1,015	Five models mixed	Trial period, low volume
2026-03	\$48,623	99.6% Opus 4.6	Heavy usage starts; per-call prefix jumped 58K → 417K in one step
2026-04	\$77,754	Opus 4.6 \$51K + Opus 4.7 \$25K	Opus 4.7 release on 4/16, alias auto-upgraded

Two key observations:

From March to April, Opus 4.6 cost barely changed (\$48K → \$51K, +7%). \$/turn went from \$0.692 → \$0.713, a 3% delta. Usage habits stayed flat.
The extra \$25K in April is almost entirely the Opus 4.7 layer.

So the "lately it got expensive" feeling isn't because I changed anything — it's because Opus 4.7 shipped on 4/16 and the opus alias automatically pointed to the new version. With no version pinned in settings, the next session jumped to it.

This is normal alias behavior, not something hidden. But for subscription users, the quota impact is real — as we'll see, the new version's adaptive thinking burns quota at 2.4× the old.

Multi-Dimensional Breakdown for April

Here's the full cut by model for one month:

Dimension	Opus 4.6	Opus 4.7	Sonnet 4.6	Haiku
Volume
Sessions (main/sub)	24/138	18/84	5/46	1/376
Total turns	72,431	31,621	15,182	16,138
% of total turns	47.4%	20.7%	9.9%	10.6%
Wall-clock hours	635	270	72	14
Active hours (no idle)	237.9	106.5	40.2	6.2
Output Profile
Turns/active hour	305	297	378	2,614
Tools/turn	0.62	0.63	0.64	0.68
Output tokens/turn	227	667	456	101
Sub:Main turn ratio	1:1.32	1:15.56	1:14.95	n/a
Cost
Equivalent \$	\$51,700	\$24,595	\$773	\$114
Cost share	67.0%	31.9%	1.0%	0.1%
Quota burn rate	1.0×	2.4×	0.2×	0.05×
\$/turn	\$0.714	\$0.778	\$0.051	\$0.007

Cross-column observations:

Opus 4.7 emits 2.9× more output tokens per turn (667 vs 227). It's not verbose — adaptive thinking's reasoning chain counts as output. To complete the same task, 4.7 burns roughly 3× the output of 4.6.

Opus 4.7 doesn't delegate. Sub:Main turn ratio jumped from 4.6's 1:1.32 to 1:15.56 — 4.6 is a "give half to sub-agents" collaborator, 4.7 is a "think it through alone" lone wolf. This explains the 3× output per turn: thinking is all done in-house.

Sonnet 4.6 \$/turn is 1/16 of Opus. But it only made up 9.9% of turns — clearly underused.

Haiku is the invisible workhorse. Zero main sessions, 376 sub-sessions, 16K turns for \$114 — all triggered automatically by Claude Code's built-in Explore / Plan agents. Untouched, still doing 10% of total turns.

Five Common "Cost-Saving Tips" Debunked

Community lore (Reddit / HN / Discord) graded against real data.

❌ "Long sessions are the culprit"

The intuition: longer sessions mean longer conversation history, more cache prefix re-read per turn, more cost as the session drags on.

The data: March vs April Opus 4.6 usage is nearly identical (69,980 vs 72,510 turns), but \$/turn moved from \$0.692 → \$0.713, a 3% bump. If long sessions were the driver, the per-turn cost should creep up month over month. It doesn't.

More precisely: cache_read accounts for 77–88% of cost on both Opus versions. The number is huge, but the ratio has been that way since heavy Claude Code usage started — it's the inherent cost of "talking to an LLM," not the price of "not splitting sessions." /clear doesn't recover much.

❌ "Run `/clear` after 5+ min idle"

The intuition: 5-minute cache TTL means a brief idle expires the cache, so the next turn pays for a rewrite.

The data: my }}">second audit shows the main agent has been writing 100% to 1h TTL for 17 straight days since 4/9, with zero 5m writes. Idle a while and come back, cache is still there. No extra write cost.

The forced 5m downgrade only hits sub-agents (same post). But sub-agents only contributed a small slice of April's cost (~\$1,500 estimated), two orders of magnitude less than the \$25K from 4.7.

❌ "Too many skills"

The intuition: loaded skills inject metadata into the system prompt every turn.

The data: I actually measured. 40 skill descriptions add up to ~5–10K tokens. In a 425K per-call prefix, that's 1–2%. Deleting all of them saves <\$1K/month — not worth the effort.

❌ "Too many MCP servers"

The intuition: MCP tool definitions land in the prefix every turn.

The data: setup is 3–4 MCPs (pixel-mcp, the Google Workspace trio), several of which fail to connect and don't load. Already lean, nothing to trim.

❌ "CLAUDE.md is too long"

The intuition: CLAUDE.md gets re-read every turn.

The data: the project root CLAUDE.md is 1 byte (essentially empty), the global one is 0 bytes. Zero impact.

These five aren't wrong in every scenario. For someone with a 50K-token CLAUDE.md or 20 loaded MCP servers, they apply. But as generic advice spread to everyone, data shows they barely help a heavy single-project workflow.

✅ The Two Things That Actually Work

After the intuition reckoning, only two things hold up against the data:

1. Pin Specific Model Versions in settings.json

Don't use opus / sonnet aliases. When Anthropic ships a new version, the alias auto-points to it — invisible to the user but quota behavior shifts dramatically.

{
  "model": "claude-opus-4-6",
  "permissions": { "...your existing..." }
}

This way when opus-4.8 / 4.9 ships, you don't auto-follow. New versions aren't always more economical — for 4.6 vs 4.7:

\$/turn +9%
Output/turn +190%
Quota burn +140%
Turns to complete same work only −12%

Net CP value is 1.9× higher on 4.6. Every model release, check cnighswonger's advisory and run your own data for a while before deciding to upgrade.

On adaptive thinking: 4.7 burns hard mainly because adaptive thinking counts the reasoning chain as output tokens. Opus 4.6 / Sonnet 4.6 let you disable it via CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1, but Opus 4.7 forces it on with no toggle — that's why "lock 4.6" is the practical fix instead of trying to mitigate 4.7. Auto mode and adaptive thinking are independent features; pinning 4.6 doesn't affect auto mode.

2. Route Review / Fix / Test to Sonnet

The \$/turn gap is real (Opus \$0.71 vs Sonnet \$0.045 — 16×). My April: 14K turns on Sonnet for \$643, same turns on Opus 4.6 would have been \$10K.

Switch to Sonnet for:

Code review, reading PR diffs
Small bug fixes, type annotations, null checks
Writing tests, adding test cases
Docs, commit messages, changelogs
Renames, simple refactors

Stay on Opus for:

Cross-file architectural rewrites
Design decisions needing long reasoning chains
Complex debugging (race conditions, memory leaks)
Exploring an unfamiliar codebase the first time

How: inside a session, /model claude-sonnet-4-6 to switch over for a few rounds, then /model claude-opus-4-6 to switch back. Don't lock Sonnet in settings — you'll forget to switch when you need Opus.

Real Magnitudes

If both levers are in place, expected April-baseline change (against \$77K):

Action	Expected Savings	% of Monthly Bill
Pin 4.6 (cancel 4.7 auto-follow)	\$25K/mo	32%
Route review/fix/test to Sonnet (expand to 30% of turns)	\$10–15K/mo	13–20%
Total	\$35–40K/mo	45–52%

The remaining 50% is the inherent cost of "heavy Opus 4.6 usage on a primary project" — not optimizable, and shouldn't be. That's the work itself.

Lessons

The biggest takeaway from turning myself into a dataset isn't the savings — it's seeing how unreliable community intuition is.

"Shorter session = cheaper," "fewer skills = cleaner" might hold in some scenarios, but for single-project heavy-use workflows they're flat wrong. Without breaking cost down to model × session × turn, I'd never have spotted that "the Opus 4.7 alias upgrade" is the single biggest reason April got expensive.

Broader lessons:

Floating optimization tips are noise — without data, "cost-saving advice" often optimizes the wrong thing
Aliases hand cost control to the vendor — the mechanism isn't bad, but it's a real risk for subscription users with quota planning
Multi-model strategy beats single-model tuning — same dollar, Sonnet does 16× the turn volume

If you want to scan your own, the 60-line Python from }}">the first post is reusable — adjust the cost calc and you'll get this analysis for your data. Make yourself a dataset and re-check what the community thinks it knows.

}}">Background: Claude Code session cost & cache misconception

References

Cache TTL silently regressed — GitHub Issue #46829
Subagent trailing block missing cache_control — Issue #50213
Widespread quota drain since 2026-03-23 — Issue #41930
cnighswonger/claude-code-cache-fix — Opus 4.7 quota burn advisory + cache fix proxy
Claude API Pricing
Anthropic: Claude quota drain not caused by cache tweaks — The Register

12 More Days Scanned: Claude Code Sub-Agent Cache TTL Has Been 100% 5m for 17 Straight Days — This Isn''t a Regression, It''s the New Default

Recca Tsai — Sat, 25 Apr 2026 23:16:26 +0000

Originally published at recca0120.github.io

}}">Two weeks ago I scanned 95 days of Claude Code logs and found that since 4/9, sub-agents had been 100% downgraded to 5m TTL — 5 consecutive days, 4,840 API calls, with the main agent completely untouched. I left the conclusion at "monitoring," since 5 days could still be rollout flapping.

Today 4/26, I re-ran the same Python. The streak is now 17 days, 15,727 API calls, 0 1h writes. This isn't flapping — Anthropic's server has quietly hard-coded the sub-agent default TTL to 5m. No changelog, no announcement, and the main issue was just closed without resolution.

This is a follow-up: latest data, cost math, community and media state, and why cnighswonger's proxy can't save you here either.

Past Two Weeks of Data

Scan covers 4/13–4/25 (cut-off of last post → today):

Metric	Main agent	Sub-agent
Total API calls	60,291	15,727
1h writes	100% (150.7M tokens)	0
5m writes	0	100% (60.4M tokens)
Consecutive 1h-write days	13	0
Consecutive 5m-write days	0	13

Add the 4/9–4/12 stretch and the sub-agent has run 17 straight days at 100% 5m, with 0 1h writes. Sub-agent workload didn't drop — 4/14 (the day I posted last) hit 2,648 calls, 4/17 spiked to 2,821, both two-week highs. The full cost impact landed on me.

Key contrast: the main agent stayed 100% 1h the entire time, untouched. So this is unambiguously server-side discrimination against the "sub-agent identity" — not quota throttling, not a client version, not a workflow change.

How Much More Expensive: The Math

Anthropic's official cache pricing:

Cache write to 5m TTL: 1.25× base input price
Cache write to 1h TTL: 2× base input price
Cache read (both): 0.1× base input price

Intuition says 5m writes are cheaper — 1.25× vs 2×, a 37.5% saving. But sub-agent workflows defeat that intuition.

A typical sub-agent runs 30 minutes, 5 turns. Between turns it waits for the LLM to think, runs tools, parses results. 3 inter-turn gaps over 5 minutes is normal. Each gap past TTL expires the cache and forces a rewrite next turn.

Total cost (with base input as 1×):

Old (1h TTL):
  1 cache write @ 2×  = 2.0
  4 cache reads @ 0.1× = 0.4
  Total = 2.4×

New (5m TTL):
  4 cache writes @ 1.25× = 5.0
  1 cache read   @ 0.1× = 0.1
  Total = 5.1×

About 2.1×. A heavy sub-agent workflow (parallel Task fan-out, long plan-execute, code-review pipelines) that used to cost \$10 now costs \$21.

This assumes inter-turn gaps average over 5m. If your sub-agent finishes every turn within 5m (e.g. pure retrieval), the impact is much smaller. The hardest-hit are sub-agents that "run long, wait for tool results."

GitHub Activity Past Week

Issue #46829: Closed by Anthropic

cnighswonger's #46829 was closed by Anthropic without a fix. Comments are uniformly angry:

DaQue: "I don't like the stealth nerf."
rinchen: "Yet another issue closed without resolution by Anthropic."
lizthegrey (Engineering Director at Honeycomb, jumped in 4/25): posted her own grep one-liner, listed her affected versions and dates (4/01 v2.1.81, 4/09 v2.1.85, 4/13–4/17 v2.1.92, 4/21 v2.1.114), and explicitly stated she provided redacted jsonl transcripts to Anthropic. The most credible piece of evidence submitted so far.

# lizthegrey's one-liner
grep -h -r -E 'ephemeral_.*_input_tokens' ~/.claude | \
  jq 'select(.isSidechain == false and (.message.model | startswith("claude-haiku") | not) and .message.usage.cache_creation.ephemeral_5m_input_tokens > 0) | .timestamp + "," + .version' 2>/dev/null | \
  sed 's/T.*,/,/' | sort | uniq -c

Same data source as my 60-line Python from the last post, just more concise. Drop-in usable.

Issue #50213: Sub-agent Trailing Block Missing cache_control

ofekron added measurements on #50213 on 4/17: every built-in sub-agent (Explore, Plan, general-purpose) shows nonzero cache_creation on second spawn — the trailing system-context block has no cache_control marker, so each fresh spawn wastes ~4.7K tokens rewriting. 0 new comments past week — this issue is being ignored.

Together the two issues say the same thing: Anthropic's posture toward sub-agent cache leans toward "save where we can," not "optimize where we can."

No Movement from Anthropic Staff

bcherny's earlier mention of a "per-request env var / flag for TTL" — still not shipped
Jarred Sumner's earlier defense in The Register that "sub-agent 5m is a one-shot optimization" — no response to the 4/9 100% 5m data
Anthropic posted nothing on these issues in the past week

Update (2026-04-26): Official Position vs My Data

After publishing, I dug into Anthropic's public posture. Boris Cherny (creator of Claude Code), via The Register:

"One-hour cache has been implemented in some places for subscribers, while a five-minute cache is the true default."

So Anthropic's official line is "5m is the true default; 1h is opt-in for some subscriber scenarios" — which actually agrees with this post's framing of "this is the new default, not a regression."

But the official stance can't explain one thing: time-series data from }}">the first audit shows that from 2026-02-07 to 03-05, 28 consecutive days, sub-agents received 100% 1h (not mixed, not 50%). If those 28 days of "1h treatment" were a "special case," it was a stably-allocated special case, not an occasional gift.

This post's 17 days of 100% 5m can be re-positioned: the sub-agent 1h treatment subscribers used to receive is being stably revoked. Anthropic didn't "change the default," but the "1h special case formerly granted to sub-agents" effectively disappeared. That's a fact the official statement can't paper over.

Media Coverage and a Bigger Thread

This isn't just blowing up on GitHub:

The Register (4/13): Anthropic: Claude quota drain not caused by cache tweaks — Anthropic publicly denies a cache link, with Sumner's defense quoted in full
XDA Developers: Anthropic quietly nerfed Claude Code's 1-hour cache
DevOps.com: Developers Using Anthropic Claude Code Hit by Token Drain Crisis

Worth tracking: Issue #41930 — since 3/23 every paid tier has been hit by abnormal quota burn, Pro / Max 5× / Max 20× included. Single prompts eat 3–7% of session quota; 5h windows drain in as little as 19 minutes. The community treats cache TTL regression, autocompact cascades, and sub-agent fan-out as stacked root causes. My 4/9 second-wave finding fills in the timeline of "sub-agent specifically got worse again on 4/9."

Can cnighswonger's Proxy Save This? My Take

cnighswonger/claude-code-cache-fix v3.0.3 has nice A/B numbers on CC v2.1.117: through the proxy 95.5% cache hit rate, direct 82.3%. It runs 7 hot-reloadable extensions, including ttl-management, which "detects server TTL tier and injects correct cache_control markers."

But for the "server force-writes sub-agent into 5m" problem, the proxy probably can't save you. My read:

The proxy fixes "caches that should hit but miss because of client bugs" (unstable fingerprint, non-deterministic tool ordering, inconsistent cache_control markers)
It can't fix "client marks 1h, server still writes 5m" — that's server-side behavior, the proxy can't rewrite responses
From our 17 days of 100% 5m / 0 1h writes, the server is doing the latter for sub-agents

Easy to verify: install the proxy, run the same script against ~/.claude/projects/*.jsonl, see if sub-agent ephemeral_1h_input_tokens ever goes from 0 to nonzero. If it stays 0, the server-side change is confirmed.

This isn't a knock on cnighswonger's proxy — it has demonstrated value for the main agent and any cache-miss scenario. Just don't expect it to "bring back sub-agent 1h TTL."

Conclusion: This Is the New Default

In the 4/14 post I called the 4/9 wave "a second silent regression." On 4/26 I'm revising the wording: this is no longer a regression — it's Anthropic's new default for sub-agents.

Evidence weight:

17 consecutive days (4/9–4/25)
15,727 API calls in just the past 13 days
0 1h writes (not low — actually zero)
Main agent untouched (clear differential treatment)
Media + GitHub + community on fire, Anthropic stays silent

If you lean heavily on sub-agents:

Scan your own data first — use the Python from the last post, or lizthegrey's jq one-liner above
Calculate the actual cost impact — it's not "a bit more," it's about 2×
Re-evaluate your sub-agent workflows — anything doable in the main agent shouldn't fan out to sub-agents
Drop a data point on issue #46829 — closed but still indexed. With Honeycomb-tier voices already pushing, more data makes external coverage easier to follow up

}}">Background — Claude Code session cost & cache misconception covers the cache cost logic. }}">First audit covers how to scan your own logs to verify. Read both for the full picture.

References

Cache TTL silently regressed — GitHub Issue #46829 — closed, community still commenting
Subagent trailing block missing cache_control — Issue #50213
Widespread quota drain since 2026-03-23 — Issue #41930 — parent issue with stacked root causes
Anthropic: Claude quota drain not caused by cache tweaks — The Register
Anthropic quietly nerfed Claude Code's 1-hour cache — XDA Developers
Developers Hit by Token Drain Crisis — DevOps.com
The 5-Minute TTL Change That's Costing You Money — dev.to
cnighswonger/claude-code-cache-fix — proxy + extension package

Node.js spawn stdout Gets Truncated: Compared 6 Fixes, Only the File Trick Works

Recca Tsai — Sat, 25 Apr 2026 21:27:12 +0000

Originally published at recca0120.github.io

One day I was spawning a CLI from Node (the Claude CLI, which dumps a lot of JSON) and piping its stdout back to parse. Short outputs were fine. But the moment output grew past a few hundred KB, the last few KB just disappeared — JSON.parse blew up on the final line, and the truncation point shifted run to run.

After digging through Node's official issues, Linux pipe docs, and community deep-dives, the verdict is blunt: this is a known Node behavior since 2015, and the only reliable pure-stdlib fix is writing to a temp file. This post lays out the trade-offs across six approaches so you don't have to repeat the journey.

The Symptom

import { spawn } from 'node:child_process';

const child = spawn('some-cli', ['--big-output']);
let buf = '';
child.stdout.on('data', (chunk) => { buf += chunk; });
child.on('close', () => {
  JSON.parse(buf); // blows up on large outputs
});

The bigger the output, the higher the chance. MB-scale almost always truncates; hundreds of KB occasionally. The cut-off point isn't fixed — sometimes you get 1.2 MB, sometimes 1.18 MB.

Why It Truncates

The root cause is how Node writes stdio. The child's stdout connects to a pipe, and writes to a pipe are async. When the child calls process.exit(), Node doesn't wait for buffered data to flush — the process exits immediately, and whatever sat in the pipe unread gets lost.

If stdout is a TTY or a regular file, writes are sync and this never happens. The bug only triggers on "non-TTY, non-file fds" — pipes, FIFOs, and sockets.

This was first tracked in Node issue #3669 (2015), then revisited in #6379 and #9633. The community consensus: user-land's only reliable workaround is writing to a file. Core has no plans to change it.

Six Approaches Side by Side

Approach	Viability	Trade-off
A. Temp file	Pure stdlib	One extra disk I/O, ~10ms
B. node-pty / get-pty-output	Child sees a TTY	Needs native build; CLI may inject ANSI codes that pollute JSON
C. `F_SETPIPE_SZ` to enlarge pipe	Linux only	macOS lacks the API; only delays the cut-off point
D. Named pipe (FIFO)	Same dead end	FIFOs are non-file fds too — same truncation
E. UNIX socket	Same dead end	Sockets are also non-file fds, async writes still truncate
F. Fix the child CLI itself	Root cause	Usually not under your control

Why D / E Don't Work Either

A common first thought: "Skip the pipe, use a FIFO or UNIX socket — that should work, right?" I tried it. Same truncation.

The reason: "async pipe writes" isn't a property of pipes specifically — it's a property of non-file, non-TTY fds. Linux and macOS route writes to such fds through the async path, and FIFOs and sockets fall in the same bucket. Identical behavior.

Why C Looks Promising but Isn't

fcntl(F_SETPIPE_SZ) can grow the Linux pipe buffer from 64 KB default to 1 MB (more than that needs root). Sounds great — fill the buffer big enough and nothing gets cut, right?

Three problems:

Linux only. macOS doesn't have F_SETPIPE_SZ
Only delays the cut-off. Outputs > 1 MB still truncate — no real fix
Still needs a native binding. fcntl isn't exposed in Node — you'd write a C++ addon or use ffi-napi

If you want pure stdlib and cross-platform, this path is out.

What Actually Works: Pipe stdout to a File

The trick is to use fs.openSync to grab a file fd and hand it to spawn's stdio option. This way the child writes stdout directly to the file, bypassing any pipe — writes are sync, process.exit() won't truncate:

import { spawnSync } from 'node:child_process';
import { openSync, closeSync, readFileSync, unlinkSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join } from 'node:path';

const outPath = join(tmpdir(), `cli-${process.pid}-${Date.now()}.out`);
const fd = openSync(outPath, 'w');

try {
  // stdio: [stdin, stdout, stderr]
  // Send child stdout straight to a file fd, no pipe in between
  spawnSync('some-cli', ['--big-output'], {
    stdio: ['ignore', fd, 'inherit'],
  });
} finally {
  closeSync(fd);
}

const output = readFileSync(outPath, 'utf8');
unlinkSync(outPath);

JSON.parse(output); // no longer blows up

For an async version, use spawn + child.on('close', ...). Same principle: fd to a file, never a pipe.

If stderr is also high-volume, do the same with a second fd. 'inherit' forwards to the parent's stderr cleanly but you can't capture it.

The ~10ms Disk I/O Cost

The only downside is the extra disk I/O — about 10ms on SSD. For a CLI that runs for several seconds, this is noise. If you genuinely care about that 10ms, the only remaining path is node-pty, but you'll deal with:

Native build (extra compile step in CI)
Child sees a TTY and may inject ANSI color codes into stdout — strip them
macOS and Windows backends differ (forkpty vs conpty), test both

My take: if a temp file works, use a temp file. Trading 10ms to avoid a native dependency and ANSI pollution is an easy win.