Executive summary
Our subagents (.claude/agents/*.md) all run on Sonnet 4.6 by default at $3 / $15 per 1M (input/output). That's the single biggest line on our AI spend. This doc lays out a hybrid architecture where most subagents run on Haiku 4.5 ($1 / $5) and offload heavy generation/analysis to OpenRouter (paid key, already in .env.local) via the existing /fix-review REST plumbing. A handful of high-risk agents stay on Sonnet.
Expected savings at full rollout: ≈ 70–85 % of the current subagent spend, with equivalent quality for analytical tasks and gated quality for generational tasks.
Context
What exists today:
-
11 subagents declared in
.claude/agents/*.md, all on Sonnet by default. -
/fix-reviewskill already delegates 3 model rounds to OpenRouter (DeepSeek V3.2, Qwen3 Coder Next, Grok 4.1 Fast) and only calls Sonnet for the Arbiter round. That's the template. -
.claude/skills/lib/rest.shexposeschat_payload+rest_post— provider-agnostic REST helpers, already wired for OpenRouter. -
CLI agents installed:
opencode,cursor-agent,kilo,codex. Each supports headless invocation and--modelselection. We usedcodex execfor the 5 Tenancy v3 ship PRs while Claude's sub-agent quota was exhausted — that confirmed the hybrid pattern works. -
OPENROUTER_API_KEYis paid — no free-tier rate-limit anxiety.
What's wasteful today:
- Subagents like
code-simplifier,docs-maintainer,pm-issue-writerdo small, structured prose tasks on Sonnet. A Gemini Flash Lite at $0.037 / $0.15 would be ~100× cheaper and equally good for that workload. -
code-generatorspends Sonnet tokens on file IO loops that an OpenRouter-backed CLI (Codex / Opencode) can handle for ~10× less.
Provider/model inventory (verified 2026-04-22)
OpenRouter (paid, direct REST via lib/rest.sh)
| Model | $/1M in | $/1M out | Best for |
|---|---|---|---|
deepseek/deepseek-v3.2 |
0.26 | 0.38 | Code generation, refactor, security review |
qwen/qwen3-coder-next |
0.15 | 0.80 | TS-heavy code, test generation |
x-ai/grok-4.1-fast |
0.20 | 0.50 | Fast parallel review vote |
google/gemini-2.5-flash |
0.075 | 0.30 | Docs, ADRs, prose |
google/gemini-2.5-flash-lite |
0.037 | 0.15 | Lint classification, small structured calls |
Baseline comparison:
- Claude Sonnet 4.6: $3 / $15
- Claude Haiku 4.5: $1 / $5
CLIs (useful when we need a built-in tool-use loop with file edits)
| CLI | Headless command | Cheap / free options | Notes |
|---|---|---|---|
opencode |
opencode run --model X "prompt" |
opencode/gpt-5-nano, opencode/*-free, or any google/*
|
Good for multi-file tasks |
kilo |
kilo run --model X "message" |
kilo/kilo-auto/free, kilo/openrouter/free, kilo/x-ai/grok-code-fast-1:optimized:free
|
Thin over OpenRouter |
cursor-agent |
cursor-agent --model X -p "prompt" |
composer-2-fast (Cursor Pro), gpt-5.3-codex-*
|
Good for large-scale rewrites |
codex |
codex exec -C <dir> -s danger-full-access "prompt" |
OpenAI gpt-5.4 default |
Needs -s danger-full-access — bubblewrap sandbox unreliable on this host |
Route CLIs through --model openrouter/<id> when we want them to hit the same OR quota.
Target architecture
Three subagent classes, three patterns:
Pattern A — Skill replaces subagent (direct OR REST)
For pure analytical subagents whose input is a small artifact (diff, lint output, hook source) and whose output is a structured text blob (report, suggestion list, simplified file).
parent Claude → Skill (thin bash) → rest_post to OpenRouter → deterministic output
No agentic loop, no Sonnet at all. Example: code-simplifier → /simplify skill.
Pattern B — Haiku orchestrator + direct OR REST
For subagents that need a validation gate: generate output with a cheap model, then have a small, cheap reasoner verify the shape, run tests / decide the next step.
Haiku orchestrator subagent
→ OR REST (e.g. qwen3-coder-next) to generate candidate
→ Haiku validates shape + runs verifier
→ success: commit; failure: retry with better prompt
Applies to test-generator, security-reviewer (Haiku + chorus review), static-analysis.
Pattern C — Haiku orchestrator + CLI worker (tool-use loop)
When the task requires real file IO and iterative tool use (multi-file edits, builds, tests in a loop), delegate to a CLI that has a built-in tool-use loop.
Haiku orchestrator subagent
→ codex exec / opencode run (routed through openrouter/deepseek-v3.2)
→ Haiku reviews diff + lint + test output before commit
Applies to code-generator.
Pattern D — Keep Sonnet
For subagents where project-specific conventions in CLAUDE.md matter more than cost, and where mistakes are high-blast-radius.
Applies to bug-fixer, migration-generator, project-devops.
Per-agent decision matrix
| Agent | Pattern | Primary model | Validation gate |
|---|---|---|---|
code-generator |
C |
openrouter/deepseek/deepseek-v3.2 via codex exec
|
Haiku 4.5 reviews diff; runs lint + tests |
test-generator |
B | openrouter/qwen/qwen3-coder-next |
Haiku 4.5 checks shape; runs npm run test
|
code-simplifier |
A (skill) | openrouter/deepseek/deepseek-v3.2 |
None — output applied directly |
security-reviewer |
B |
chorus review (3 OR models parallel) |
Haiku 4.5 synthesizes findings |
static-analysis |
B | openrouter/google/gemini-2.5-flash-lite |
Haiku 4.5 classifies safe vs unsafe fixes |
docs-maintainer |
B | openrouter/google/gemini-2.5-flash |
Haiku 4.5 lints prose against CLAUDE.md style |
pm-issue-writer |
B | openrouter/google/gemini-2.5-flash |
Haiku 4.5 checks OB Base compliance |
ci-build-agent |
B (Haiku-only, no OR) | Haiku 4.5 | Small context; no delegation needed |
bug-fixer |
D | Sonnet 4.6 | Iterative debugging needs full context |
migration-generator |
D | Sonnet 4.6 | RLS / immutability conventions are subtle |
project-devops |
D | Sonnet 4.6 | SSH / TLS / DB — prod-blast-radius |
Migration order
-
code-simplifier→ Pattern A skill. Smallest scope. Baseline measurement. -
static-analysis→ Pattern B. Similar surface. -
docs-maintainer→ Pattern B. Prose on Gemini Flash. -
pm-issue-writer→ Pattern B. Template work. -
security-reviewer→ Pattern B wrapped around existingchorus review. -
test-generator→ Pattern B with test execution loop. -
code-generator→ Pattern C with Codex backend. Largest, highest-risk. -
ci-build-agent→ Pattern B (Haiku-only).
At each step:
- Ship one agent transition as its own PR.
- Measure 3–5 real-task invocations before rolling to the next agent.
- Metrics: token cost per invocation, success rate (did the output commit cleanly?), human-correction rate (did we have to fix the output manually?).
- If the success rate drops > 10 % vs the Sonnet baseline, stop and reassess.
Skills with the same treatment
-
/fix-review— already done. -
/task,/backlog,/pm-issue-writer— Haiku is enough; no OR delegation needed. -
/ship— add--cheapflag that swaps its code-generator step to Pattern C. -
/find-bugs— delegate tochorus debug. -
/review— already uses parallel agents.
Risks
-
Haiku context awareness is lower than Sonnet. Prompts must inline the most relevant
CLAUDE.md/docs/*section rather than assume the agent "knows the codebase". Mitigation: for each agent that moves, expand the system prompt to include the exact conventions it must adhere to (e.g., stale-time rules, RLS patterns). -
OpenRouter model drift. A model string that works today may change default behavior next week (e.g.,
deepseek-v3.2swaps back-end). Pin models explicitly and record the version inconfig.yaml/ telemetry. -
CLI sandboxing quirks.
codex execneeded-s danger-full-accessthis session (bubblewrap failure on this host). Tests in CI may hit similar issues. Each CLI migration needs a--dry-run(smoke) test. -
Free vs paid confusion. Free
:freetiers have rate limits that silently 429. Migration MUST use paidopenrouter/*model IDs and must NOT mix with CLIs in their free default mode (kilo/kilo-auto/free). - Rollback coupling. If a migrated agent regresses and we revert it to Sonnet, we need to keep the Pattern-A skill / Pattern-B wrapper in place for downstream callers. So: don't delete Sonnet-agent definitions until the hybrid has run cleanly for at least a week.
-
Log attribution. When OR models do the work, our existing
.claude/skills/fix-review/telemetry.jsonlrecords cost per-PR. We need an equivalent file per migrated subagent, or at least a shared one. - Output parsing tolerance. Cheap models return messy output (no JSON mode guarantees across providers). Orchestrators must tolerate trailing prose, markdown fences around JSON, etc.
Non-goals / out of scope
- Rewriting
/fix-review— it already does this pattern. - Moving
bug-fixer,migration-generator,project-devopsto any cheaper model. Their blast radius dominates their spend. - Replacing the orchestrator Claude (top-level) with Haiku. The top-level agent needs to read this doc, understand the whole session, and make decisions — Sonnet stays there.
- Switching CLI worker choice inside a single PR / session. Pick one backend per migration and stick with it.
Verification/acceptance
The overhaul is successful when:
- 8 of 11 subagents run on Haiku + OR delegate by default.
- Telemetry shows ≥ 50 % cost reduction on subagent spend over a rolling 7-day window.
- Human-correction rate (PRs where a sub-agent-produced change had to be manually fixed before merge) stays within 15 % of the Sonnet baseline.
- No regressions in security reviews (security-reviewer catches the same class of findings it did on Sonnet — validated by a manual replay of 3 recent PRs).
-
docs/ai-agent-cost-optimization.mdand each migrated agent's frontmatter reflect the new model + pattern.
References
-
.claude/skills/fix-review/config.yaml— provider/model routing template. -
.claude/skills/lib/rest.sh— REST helpers. -
.claude/agents/*.md— current subagent definitions. -
~/wrk/projects/chorus/chorus/plugins/chorus/scripts/companion.mjs— multi-CLI review pipeline. - OpenRouter pricing: https://openrouter.ai/models (snapshot 2026-04-22).

Top comments (0)