Claude Code defaults to Opus 4.6 for most tasks. That's the right call for hard problems. It's the wrong call for "what does this function do?" asked 50 times a day.
If you've checked your Anthropic bill after a week of heavy Claude Code usage, you know the math. This is how to fix it without losing the quality on tasks that actually need it.
Why Claude Code gets expensive
Claude Code is an agentic coding assistant. It reads files, runs commands, makes multiple API calls per user action, and often sends large context windows on each call.
A typical session might make 10-30 API calls — repo-map reads, edit confirmations, error handling, follow-ups. All at Opus pricing by default.
The problem isn't Claude Code. It's sending every call — including trivial ones — to the most expensive model.
The OpenAI-compatible option
Claude Code supports custom OpenAI-compatible API endpoints. You can point it at any endpoint that speaks the OpenAI chat completions format.
In your Claude Code settings:
{
"apiProvider": "openai",
"openAiBaseUrl": "https://www.komilion.com/api/v1",
"openAiApiKey": "ck_your_komilion_key",
"openAiModelId": "neo-mode/balanced"
}
Or via environment variables:
export OPENAI_BASE_URL=https://www.komilion.com/api/v1
export OPENAI_API_KEY=ck_your_komilion_key
With this config, Claude Code sends requests to the routing layer instead of directly to Anthropic. The routing layer reads each request, classifies it, and picks the cheapest model that can handle it.
What the routing looks like in practice
Simple Claude Code operations (file reads, quick questions, confirmation prompts):
→ Routes to Gemini Flash class models (~$0.006/call)
Standard coding tasks (debugging, code review, explain this function):
→ Routes to Sonnet-class models (~$0.10/call)
Complex work (architecture decisions, multi-file refactors, hard bugs):
→ Routes to Opus 4.6 (~$0.55/call)
Claude Code doesn't know which path was taken. From its perspective, it made an API call and got a response. The routing is invisible.
The actual numbers
Say you run 100 Claude Code API calls in a typical day:
- ~60 are simple (file reads, quick questions, confirmations)
- ~30 are moderate (coding, debugging)
- ~10 are complex (architecture, hard problems)
Direct Opus on everything:
100 calls × $0.55 avg = $55/day → ~$1,650/month
With routing:
60 × $0.006 + 30 × $0.10 + 10 × $0.55 = $0.36 + $3.00 + $5.50 = $8.86/day → ~$265/month
That's an 84% cost reduction on the same workload, with Opus still being used for everything that needs it.
These numbers assume "average" Komilion pricing. Your actual split depends on your workflow — heavier architectural work means more premium routing. You can check data["komilion"]["cost"] in each response to see exactly what each call cost.
Staying on Opus when you need it
If you're in a complex session and want to ensure Opus handles everything, switch the model string in your Claude Code config temporarily:
"openAiModelId": "neo-mode/premium"
This is the workflow: use balanced as your default, switch to premium for the hard sessions, drop back to balanced when you're done.
What about the January ban?
If you're reading this because your Claude Code setup broke in January 2026: the enforcement blocked the subscription OAuth token path, not the API.
Claude Code running via a direct API key (either Anthropic's or an OpenAI-compatible endpoint) was never restricted. If you switched to an API key-based setup after January, you're on the right path — this article is just about optimising costs once you're there.
Full migration context: komilion.com/cline
Quick setup
- Get an API key at komilion.com ($5 free credits, no card)
- Configure Claude Code (see JSON config above — apiProvider: openai approach is confirmed working) (Padme: ANTHROPIC_BASE_URL env var approach needs verification — use JSON config until confirmed)
- Use
neo-mode/balancedas your default model - Switch to
neo-mode/premiumfor heavy architecture sessions
Claude Code connects the same way. Your bill changes immediately.
Top comments (0)