Cline Keeps Routing to the Wrong (Expensive) Model — Here's What's Happening

#ai #cline #apidevelopment #costoptimization

If you use Cline with a non-Anthropic provider and notice it ignoring your model selection — you're not imagining it.

There's a known issue (currently open in the Cline repo) where the CLI path doesn't fully hydrate modelInfo for third-party providers, causing it to fall back to anthropic/claude-3-7-sonnet-latest regardless of what you configured. The UI shows your selected model. The API calls use something else.

This matters because at $0.55/call for Opus-class models, a misconfigured router silently burns money on tasks that could cost $0.006.

Why This Happens

Cline's model handler has two paths: VS Code extension (full model info hydration) and CLI (partial). For providers like Requesty, the CLI path sets modelId but skips fetching modelInfo. The handler then falls back to its default rather than trusting the explicit modelId.

It's a known architectural inconsistency — the fix requires the CLI to either fetch model info or relax the requirement that both modelId and modelInfo must be present before respecting a custom model selection.

The Actual Cost Problem

The deeper issue isn't the bug — it's the architecture that makes it possible. Most coding tools let you pick one model and route everything to it. That means:

"What does this variable do?" → Opus 4.6 → $0.55
"Write a commit message" → Opus 4.6 → $0.55
"Summarise this function" → Opus 4.6 → $0.55

For 60-70% of typical coding tasks, you're paying Opus prices for work that Gemini Flash handles at the same quality level for $0.006.

I tracked my own usage for a month. 64% of my API calls were tasks where the cheapest capable model scored within 5% of Opus on output quality. The remaining 36% genuinely benefited from a better model.

What Smart Routing Looks Like

The fix for the Cline CLI bug will let you use your configured provider correctly. But that still leaves the underlying problem: you're choosing one model for everything.

The pattern that actually works: classify each request by task complexity, route simple tasks to cheap fast models, reserve expensive models for the work that actually needs them.

In practice:

Commit messages, variable lookups, short explanations → $0.006/call (Gemini Flash class)
Code generation, debugging, refactoring → $0.08-0.10/call (Gemini Pro class)
Architecture decisions, complex multi-file reasoning → $0.55/call (Opus class)

Setting This Up in Cline Today

If you want smart routing to work in Cline without waiting for the model-selection bug to be fixed — or if you want automatic task-based routing rather than manually managing model configs:

Set Cline's API provider to "OpenAI Compatible"
Set API endpoint to https://www.komilion.com/api/v1
Use one of these as your model:
- neo-mode/frugal — auto-routes simple tasks to cheapest capable model
- neo-mode/balanced — good for most coding work
- neo-mode/premium — council mode for architecture decisions

This bypasses the Requesty-specific modelInfo issue entirely because it uses standard OpenAI model IDs, and it adds automatic task routing on top.

Every response includes which model was actually used and the exact cost in the komilion field of the response body — so you can verify the routing is working.

The Benchmark

I ran 40 API calls across 10 real developer tasks and published every response unedited: komilion.com/compare-v2

The finding that surprised me most: for summarisation, code explanation, and simple Q&A, the quality gap between Frugal and Opus 4.6 collapses to near zero. The tasks where Opus genuinely wins are architecture planning, complex multi-file reasoning, and ambiguous requirements — which is exactly when Premium kicks in automatically.

The Cline CLI bug will get fixed. The one-model-for-everything habit is harder to fix without a router.

$5 free trial at komilion.com/auth/signup — no card required. Change one URL, see which model handled each request and what it cost.