DEV Community: Valentyn Solomko

AI Agent Cost Optimization — Sonnet Hybrid (Haiku + OpenRouter)

Valentyn Solomko — Mon, 27 Apr 2026 10:56:33 +0000

Executive summary

Our subagents (.claude/agents/*.md) all run on Sonnet 4.6 by default at $3 / $15 per 1M (input/output). That's the single biggest line on our AI spend. This doc lays out a hybrid architecture where most subagents run on Haiku 4.5 ($1 / $5) and offload heavy generation/analysis to OpenRouter (paid key, already in .env.local) via the existing /fix-review REST plumbing. A handful of high-risk agents stay on Sonnet.

Expected savings at full rollout: ≈ 70–85 % of the current subagent spend, with equivalent quality for analytical tasks and gated quality for generational tasks.

Context

What exists today:

11 subagents declared in .claude/agents/*.md, all on Sonnet by default.
/fix-review skill already delegates 3 model rounds to OpenRouter (DeepSeek V3.2, Qwen3 Coder Next, Grok 4.1 Fast) and only calls Sonnet for the Arbiter round. That's the template.
.claude/skills/lib/rest.sh exposes chat_payload + rest_post — provider-agnostic REST helpers, already wired for OpenRouter.
CLI agents installed: opencode, cursor-agent, kilo, codex. Each supports headless invocation and --model selection. We used codex exec for the 5 Tenancy v3 ship PRs while Claude's sub-agent quota was exhausted — that confirmed the hybrid pattern works.
OPENROUTER_API_KEY is paid — no free-tier rate-limit anxiety.

What's wasteful today:

Subagents like code-simplifier, docs-maintainer, pm-issue-writer do small, structured prose tasks on Sonnet. A Gemini Flash Lite at $0.037 / $0.15 would be ~100× cheaper and equally good for that workload.
code-generator spends Sonnet tokens on file IO loops that an OpenRouter-backed CLI (Codex / Opencode) can handle for ~10× less.

Provider/model inventory (verified 2026-04-22)

OpenRouter (paid, direct REST via `lib/rest.sh`)

Model	$/1M in	$/1M out	Best for
`deepseek/deepseek-v3.2`	0.26	0.38	Code generation, refactor, security review
`qwen/qwen3-coder-next`	0.15	0.80	TS-heavy code, test generation
`x-ai/grok-4.1-fast`	0.20	0.50	Fast parallel review vote
`google/gemini-2.5-flash`	0.075	0.30	Docs, ADRs, prose
`google/gemini-2.5-flash-lite`	0.037	0.15	Lint classification, small structured calls

Baseline comparison:

Claude Sonnet 4.6: $3 / $15
Claude Haiku 4.5: $1 / $5

CLIs (useful when we need a built-in tool-use loop with file edits)

CLI	Headless command	Cheap / free options	Notes
`opencode`	`opencode run --model X "prompt"`	`opencode/gpt-5-nano`, `opencode/-free`, or any `google/`	Good for multi-file tasks
`kilo`	`kilo run --model X "message"`	`kilo/kilo-auto/free`, `kilo/openrouter/free`, `kilo/x-ai/grok-code-fast-1:optimized:free`	Thin over OpenRouter
`cursor-agent`	`cursor-agent --model X -p "prompt"`	`composer-2-fast` (Cursor Pro), `gpt-5.3-codex-*`	Good for large-scale rewrites
`codex`	`codex exec -C <dir> -s danger-full-access "prompt"`	OpenAI `gpt-5.4` default	Needs `-s danger-full-access` — bubblewrap sandbox unreliable on this host

Route CLIs through --model openrouter/<id> when we want them to hit the same OR quota.

Target architecture

Three subagent classes, three patterns:

Pattern A — Skill replaces subagent (direct OR REST)

For pure analytical subagents whose input is a small artifact (diff, lint output, hook source) and whose output is a structured text blob (report, suggestion list, simplified file).

parent Claude → Skill (thin bash) → rest_post to OpenRouter → deterministic output

No agentic loop, no Sonnet at all. Example: code-simplifier → /simplify skill.

Pattern B — Haiku orchestrator + direct OR REST

For subagents that need a validation gate: generate output with a cheap model, then have a small, cheap reasoner verify the shape, run tests / decide the next step.

Haiku orchestrator subagent
  → OR REST (e.g. qwen3-coder-next) to generate candidate
  → Haiku validates shape + runs verifier
  → success: commit; failure: retry with better prompt

Applies to test-generator, security-reviewer (Haiku + chorus review), static-analysis.

Pattern C — Haiku orchestrator + CLI worker (tool-use loop)

When the task requires real file IO and iterative tool use (multi-file edits, builds, tests in a loop), delegate to a CLI that has a built-in tool-use loop.

Haiku orchestrator subagent
  → codex exec / opencode run (routed through openrouter/deepseek-v3.2)
  → Haiku reviews diff + lint + test output before commit

Applies to code-generator.

Pattern D — Keep Sonnet

For subagents where project-specific conventions in CLAUDE.md matter more than cost, and where mistakes are high-blast-radius.

Applies to bug-fixer, migration-generator, project-devops.

Per-agent decision matrix

Agent	Pattern	Primary model	Validation gate
`code-generator`	C	`openrouter/deepseek/deepseek-v3.2` via `codex exec`	Haiku 4.5 reviews diff; runs lint + tests
`test-generator`	B	`openrouter/qwen/qwen3-coder-next`	Haiku 4.5 checks shape; runs `npm run test`
`code-simplifier`	A (skill)	`openrouter/deepseek/deepseek-v3.2`	None — output applied directly
`security-reviewer`	B	`chorus review` (3 OR models parallel)	Haiku 4.5 synthesizes findings
`static-analysis`	B	`openrouter/google/gemini-2.5-flash-lite`	Haiku 4.5 classifies safe vs unsafe fixes
`docs-maintainer`	B	`openrouter/google/gemini-2.5-flash`	Haiku 4.5 lints prose against CLAUDE.md style
`pm-issue-writer`	B	`openrouter/google/gemini-2.5-flash`	Haiku 4.5 checks OB Base compliance
`ci-build-agent`	B (Haiku-only, no OR)	Haiku 4.5	Small context; no delegation needed
`bug-fixer`	D	Sonnet 4.6	Iterative debugging needs full context
`migration-generator`	D	Sonnet 4.6	RLS / immutability conventions are subtle
`project-devops`	D	Sonnet 4.6	SSH / TLS / DB — prod-blast-radius

Migration order

code-simplifier → Pattern A skill. Smallest scope. Baseline measurement.
static-analysis → Pattern B. Similar surface.
docs-maintainer → Pattern B. Prose on Gemini Flash.
pm-issue-writer → Pattern B. Template work.
security-reviewer → Pattern B wrapped around existing chorus review.
test-generator → Pattern B with test execution loop.
code-generator → Pattern C with Codex backend. Largest, highest-risk.
ci-build-agent → Pattern B (Haiku-only).

At each step:

Ship one agent transition as its own PR.
Measure 3–5 real-task invocations before rolling to the next agent.
Metrics: token cost per invocation, success rate (did the output commit cleanly?), human-correction rate (did we have to fix the output manually?).
If the success rate drops > 10 % vs the Sonnet baseline, stop and reassess.

Skills with the same treatment

/fix-review — already done.
/task, /backlog, /pm-issue-writer — Haiku is enough; no OR delegation needed.
/ship — add --cheap flag that swaps its code-generator step to Pattern C.
/find-bugs — delegate to chorus debug.
/review — already uses parallel agents.

Risks

Haiku context awareness is lower than Sonnet. Prompts must inline the most relevant CLAUDE.md / docs/* section rather than assume the agent "knows the codebase". Mitigation: for each agent that moves, expand the system prompt to include the exact conventions it must adhere to (e.g., stale-time rules, RLS patterns).
OpenRouter model drift. A model string that works today may change default behavior next week (e.g., deepseek-v3.2 swaps back-end). Pin models explicitly and record the version in config.yaml / telemetry.
CLI sandboxing quirks. codex exec needed -s danger-full-access this session (bubblewrap failure on this host). Tests in CI may hit similar issues. Each CLI migration needs a --dry-run (smoke) test.
Free vs paid confusion. Free :free tiers have rate limits that silently 429. Migration MUST use paid openrouter/* model IDs and must NOT mix with CLIs in their free default mode (kilo/kilo-auto/free).
Rollback coupling. If a migrated agent regresses and we revert it to Sonnet, we need to keep the Pattern-A skill / Pattern-B wrapper in place for downstream callers. So: don't delete Sonnet-agent definitions until the hybrid has run cleanly for at least a week.
Log attribution. When OR models do the work, our existing .claude/skills/fix-review/telemetry.jsonl records cost per-PR. We need an equivalent file per migrated subagent, or at least a shared one.
Output parsing tolerance. Cheap models return messy output (no JSON mode guarantees across providers). Orchestrators must tolerate trailing prose, markdown fences around JSON, etc.

Non-goals / out of scope

Rewriting /fix-review — it already does this pattern.
Moving bug-fixer, migration-generator, project-devops to any cheaper model. Their blast radius dominates their spend.
Replacing the orchestrator Claude (top-level) with Haiku. The top-level agent needs to read this doc, understand the whole session, and make decisions — Sonnet stays there.
Switching CLI worker choice inside a single PR / session. Pick one backend per migration and stick with it.

Verification/acceptance

The overhaul is successful when:

8 of 11 subagents run on Haiku + OR delegate by default.
Telemetry shows ≥ 50 % cost reduction on subagent spend over a rolling 7-day window.
Human-correction rate (PRs where a sub-agent-produced change had to be manually fixed before merge) stays within 15 % of the Sonnet baseline.
No regressions in security reviews (security-reviewer catches the same class of findings it did on Sonnet — validated by a manual replay of 3 recent PRs).
docs/ai-agent-cost-optimization.md and each migrated agent's frontmatter reflect the new model + pattern.

References

.claude/skills/fix-review/config.yaml — provider/model routing template.
.claude/skills/lib/rest.sh — REST helpers.
.claude/agents/*.md — current subagent definitions.
~/wrk/projects/chorus/chorus/plugins/chorus/scripts/companion.mjs — multi-CLI review pipeline.
OpenRouter pricing: https://openrouter.ai/models (snapshot 2026-04-22).

Chorus: letting AI coding CLIs review each other

Valentyn Solomko — Fri, 17 Apr 2026 12:41:02 +0000

I use several AI coding CLIs depending on the task.

Claude Code is good at one kind of workflow. OpenCode has its own shape. Gemini CLI is useful when I want another model family in the loop. Codex is often strong when I need a second implementation or review pass.

The annoying part is not the models. The annoying part is switching tools.

chorus is my attempt to remove that friction.

It is an open-source cross-agent plugin collection for four AI coding CLIs:

Claude Code
OpenCode
Gemini CLI
Codex

The idea is simple: from the tool I am already using, I should be able to delegate a task to the other agents.

That creates a 4×3 mesh. Each agent can call the other three.

What it looks like in practice

From Claude Code:

/gemini:review Review this diff for hidden edge cases and missing tests.
/codex:run Add regression tests for the parser bug we just fixed.
/opencode:run Try a smaller refactor of the auth middleware without changing behavior.

From OpenCode, the same idea is exposed through MCP tools:

delegate_claude
delegate_gemini
delegate_codex

Gemini CLI and Codex get skills installed so they can delegate in any direction too.

The main use case: parallel review

Instead of asking one agent "is this fine?", ask three different agents to review the same change independently.

Different agents have different failure modes. One will over-focus on architecture. Another will catch a small test gap. Another will suggest a simpler implementation. Often one of them is wrong. That is fine. The value is in having multiple independent passes without leaving the terminal.

/gemini:review Check correctness and missed edge cases.
/codex:run Review test coverage and suggest missing cases.
/opencode:run Look for simplifications and risky abstractions.

This is not about pretending agents are teammates. It is about using model disagreement as a tool.

Design philosophy

The important design constraint for chorus is that it does not try to become a new AI IDE or orchestration platform. It is glue.

One install gives you access to the other agents from your preferred tool. Claude Code gets slash commands. OpenCode gets MCP tools. Gemini CLI and Codex get skills.

Keep using the interface you already like, but stop treating each CLI as an isolated island.

Installation

# Claude Code
claude plugin install https://github.com/valpere/chorus

# OpenCode
opencode plugin @valpere/chorus-opencode

# Gemini CLI
gemini skills install https://github.com/valpere/chorus --path for-gemini/claude
gemini skills install https://github.com/valpere/chorus --path for-gemini/opencode
gemini skills install https://github.com/valpere/chorus --path for-gemini/codex

I built this because my own workflow had become repetitive — make a change in one CLI, copy context into another, ask for a review, manually bring the useful parts back. It worked, but it was clumsy.

chorus turns that into a normal command.

GitHub: https://github.com/valpere/chorus

If you already use more than one AI coding CLI, this may fit your workflow without asking you to change it. If you only use one, multi-agent review may still be worth trying on risky changes. A second opinion from a different agent is often cheaper than debugging the same blind spot later.

Valentyn Solomko — Ukrainian software engineer

Unwanted gifts. When you are asked to complete a test task with an unknown repo using VSCode, look in `.vscode/`. You may see `tasks.json` there. Note that there is no comma after `"command":`.

Valentyn Solomko — Thu, 08 Jan 2026 15:34:01 +0000

AI as a Development Tool

Valentyn Solomko — Wed, 24 Dec 2025 14:42:15 +0000

"...the greatest part of the questions and controversies that perplex mankind depending on the doubtful and uncertain use of words, or (which is the same) indetermined ideas..."
-- John Locke, An Essay Concerning Human Understanding, The epistle to the reader (1690)

💡 What Is This About?

This year, many terms have emerged describing a new development tool -- AI.

Immediately, as always, discussions began about the taste of different-colored pencils, because some believe they can derive pleasure from it, while others think it's only for production, and even then, only after the church's blessing.

In principle, you can take any rules or best practices of development or engineering and replace the name of any tool with "AI," and you can smack your opponents right on the head with them.

While willing experts are feeling up this elephant and arguing about what they've touched or felt, let's try to establish some terminology.

We need something simple and understandable, so we can immediately point to where we are on the map.

I, too, can use smart words, so let's try a paradigmatic approach.

From here on -- no irony. Let's try to establish working definitions.

The goal of this document is not to evaluate AI as "good/bad," but to provide a common language for discussion: what exactly we're doing now, what level of risk we're accepting, and which practices are appropriate for the chosen approach.

🧩 Paradigms: Brief Descriptions

🔍 Vibe Coding

Vibe Coding is an approach where a developer describes desired functionality in human-understandable language and generates code using AI without detailed review or editing of the result. The emphasis is on experimentation, rapid prototyping, and trust in AI rather than on manually writing or checking each line of code.

Intent → AI → Code

AI writes code based on high-level intent
Minimal review, minimal structure
Maximum speed, maximum risk

Use when

Prototypes
Demos
Spikes
One-off code

Never use for

Production systems
Core logic
Security-sensitive paths

🔍 AI-Assisted Development

AI-assisted development (AIAD) is the use of artificial intelligence tools to support the developer at various stages of development: from writing code, testing, and debugging to optimization and automation of repetitive tasks. The developer remains an active participant in the process, and AI acts as an assistant that offers ideas, automates routine tasks, and improves code quality.

Human-led development using AI as a tool

Human owns architecture and decisions
AI accelerates implementation
Standard reviews and testing apply

Use when

Writing production features
Maintaining long-lived codebases

🔍 AI-Powered Pair Programming

AI assistants work alongside the developer in real time, suggesting alternatives, detecting errors, optimizing code, and even generating new features based on context. This approach reduces debugging time and improves project architecture.

Human ⇄ AI in real time

Continuous dialogue
AI suggests, human decides
Comparable to pair programming with a strong junior/mid developer

Use when

Day-to-day development
Learning unfamiliar codebases or languages

🔍 Generative AI for Specification and Design

AI is used for automatic generation of architectural decisions, diagrams, documentation, and even test scenarios based on requirements. This allows for quickly obtaining a system prototype and verifying its compliance with business requirements.

Requirements → AI → Specifications / Diagrams

AI helps before coding
Generates draft specifications, architecture diagrams, and test plans
Human reviews and refines

Use when

System design
Architecture exploration
Early planning stages

🔍 AI-Augmented Spec-Driven Development

AI-Augmented Spec-Driven Development is a paradigm where structured specifications (requirements, architectural constraints, acceptance criteria, etc.) are the primary source for AI-powered code generation. These specifications become the single source of truth that AI transforms into implementation, tests, and documentation. This approach ensures high alignment between requirements and implementation, reduces error risks, and increases development speed.

Specification → AI → Code + Tests

Specification is the source of truth
AI generates code and tests based on specifications
Highest predictability and maintainability

Use when

Core business logic
Financial, security, or regulated systems
Large codebases with long lifespans

🔍 AI-Driven Development (AIDD)

This is a paradigm where AI is deeply integrated into all stages of development -- from design and code writing to testing, optimization, and even deployment. The developer and AI work as partners, significantly increasing productivity and code quality.

Human + AI jointly drive the process

AI participates in design, coding, testing, and optimization
Human remains responsible for strategy and constraints

Use when

Teams are intentionally adopting AI into SDLC
Clear constraints and evaluation processes exist

🔍 Autonomous Refactoring and Adaptive Coding

AI automatically analyzes and refactors code, identifying potential problems, suggesting optimizations, and adapting to changes in the project. Such systems can learn from large amounts of code to better understand the project's context and requirements.

Existing code → AI → Improved code

AI refactors, optimizes, and reduces technical debt
Operates under strict rules and is subject to evaluation

Use when

Controlled refactoring
Performance optimization
Style and consistency improvements

🔍 Pipeline Synthesis and Security Scanning Orchestration

AI can automatically generate CI/CD configurations, analyze code security, identify technical debt, and suggest ways to eliminate it. This reduces risks and accelerates the deployment process.

Policy → AI → CI/CD and security automation

AI generates and maintains pipelines
Automates security scanning and policy enforcement

Use when

DevSecOps automation
Reducing operational risk

🔍 AI-Agent Driven Development

AI agents can independently perform certain tasks -- from generating code according to specifications to automatically creating tests, monitoring, and even fixing bugs in production. This allows creating self-protecting systems that minimize human intervention.

Goal → AI Agents → Execution

Autonomous agents decompose and execute tasks
Minimal human intervention

Use with extreme caution

Internal tools
Clearly bounded automation tasks

🔍 Self-Healing Systems

AI agents monitor the system in production, detect problems, generate patches, and automatically apply them, ensuring high reliability and minimizing downtime.

Runtime signals → AI → Fix → Deploy

AI monitors production and automatically applies fixes
Highest autonomy, highest risk

Not recommended by default

Only with strict safeguards
Only for non-critical systems

📊 Comparison Table

Paradigm	Who Leads	Source of Truth	Typical Use	Production Readiness	Risk
Vibe Coding	AI	Prompt	Prototypes, demo	❌	🔥🔥🔥
AI-assisted Dev	Human	Code	Production features	✅	🟡
AI Pair Programming	Human + AI	Code	Daily development	✅	🟡
Generative Spec & Design	Human	Spec	Architecture, planning	⚠️	🟡
Spec-Driven + AI	Specification	Spec	Core systems	✅✅	🟢
AI-Driven Dev	Human + AI	Mixed	End-to-end dev	✅	🟡
Autonomous Refactoring	AI	Code + Rules	Tech debt, cleanup	⚠️	🟡
Pipeline & Security AI	Policy	Policy	CI/CD, Security	✅	🟡
AI-Agent Driven Dev	AI Agents	Goal / Policy	Automation	⚠️	🔥🔥
Self-Healing Systems	AI	Runtime signals	Ops / reliability	⚠️⚠️	🔥🔥🔥

🌳 AI Development Paradigms -- Decision Tree

A decision tree for determining what we're discussing at any given moment.
This tree defines the development mode, not a specific tool.

🏢 Policies

🟢 Allowed by Default

AI-assisted development
AI pair programming
Refactoring with review

🟡 Allowed with Explicit Approval

AI-augmented spec-driven development
AI-generated code in core logic
Any AI in regulated domains

🔵 Operations Only

CI/CD generation
Security scanning
Dependency and policy enforcement

🔴 Explicitly Prohibited

AI-agent-driven development in production
Self-healing systems
Autonomous production changes without human approval

Overall

AI doesn't change engineering. It just makes the mistakes faster -- or more manageable.

DEV Community: Valentyn Solomko

AI Agent Cost Optimization — Sonnet Hybrid (Haiku + OpenRouter)

Executive summary

Context

Provider/model inventory (verified 2026-04-22)

OpenRouter (paid, direct REST via lib/rest.sh)

CLIs (useful when we need a built-in tool-use loop with file edits)

Target architecture

Pattern A — Skill replaces subagent (direct OR REST)

Pattern B — Haiku orchestrator + direct OR REST

Pattern C — Haiku orchestrator + CLI worker (tool-use loop)

Pattern D — Keep Sonnet

Per-agent decision matrix

Migration order

Skills with the same treatment

Risks

Non-goals / out of scope

Verification/acceptance

References

Chorus: letting AI coding CLIs review each other

What it looks like in practice

The main use case: parallel review

Design philosophy

Installation

Unwanted gifts. When you are asked to complete a test task with an unknown repo using VSCode, look in `.vscode/`. You may see `tasks.json` there. Note that there is no comma after `"command":`.

AI as a Development Tool

💡 What Is This About?

🧩 Paradigms: Brief Descriptions

🔍 Vibe Coding

🔍 AI-Assisted Development

🔍 AI-Powered Pair Programming

🔍 Generative AI for Specification and Design

🔍 AI-Augmented Spec-Driven Development

🔍 AI-Driven Development (AIDD)

🔍 Autonomous Refactoring and Adaptive Coding

🔍 Pipeline Synthesis and Security Scanning Orchestration

🔍 AI-Agent Driven Development

🔍 Self-Healing Systems

📊 Comparison Table

🌳 AI Development Paradigms -- Decision Tree

🏢 Policies

🟢 Allowed by Default

🟡 Allowed with Explicit Approval

🔵 Operations Only

🔴 Explicitly Prohibited

Overall

OpenRouter (paid, direct REST via `lib/rest.sh`)