DEV Community

Robin
Robin

Posted on

Claude Code Is Expensive by Default — Here's How to Fix That

Claude Code defaults to Opus 4.6 for most tasks. That's the right call for hard problems. It's the wrong call for "what does this function do?" asked 50 times a day.

If you've checked your Anthropic bill after a week of heavy Claude Code usage, you know the math. This is how to fix it without losing the quality on tasks that actually need it.


Why Claude Code gets expensive

Claude Code is an agentic coding assistant. It reads files, runs commands, makes multiple API calls per user action, and often sends large context windows on each call.

A typical session might make 10-30 API calls — repo-map reads, edit confirmations, error handling, follow-ups. All at Opus pricing by default.

The problem isn't Claude Code. It's sending every call — including trivial ones — to the most expensive model.


The OpenAI-compatible option

Claude Code supports custom OpenAI-compatible API endpoints. You can point it at any endpoint that speaks the OpenAI chat completions format.

In your Claude Code settings:

{
  "apiProvider": "openai",
  "openAiBaseUrl": "https://www.komilion.com/api/v1",
  "openAiApiKey": "ck_your_komilion_key",
  "openAiModelId": "neo-mode/balanced"
}
Enter fullscreen mode Exit fullscreen mode

Or via environment variables:

export OPENAI_BASE_URL=https://www.komilion.com/api/v1
export OPENAI_API_KEY=ck_your_komilion_key
Enter fullscreen mode Exit fullscreen mode

With this config, Claude Code sends requests to the routing layer instead of directly to Anthropic. The routing layer reads each request, classifies it, and picks the cheapest model that can handle it.


What the routing looks like in practice

Simple Claude Code operations (file reads, quick questions, confirmation prompts):
→ Routes to Gemini Flash class models (~$0.006/call)

Standard coding tasks (debugging, code review, explain this function):
→ Routes to Sonnet-class models (~$0.10/call)

Complex work (architecture decisions, multi-file refactors, hard bugs):
→ Routes to Opus 4.6 (~$0.55/call)

Claude Code doesn't know which path was taken. From its perspective, it made an API call and got a response. The routing is invisible.


The actual numbers

Say you run 100 Claude Code API calls in a typical day:

  • ~60 are simple (file reads, quick questions, confirmations)
  • ~30 are moderate (coding, debugging)
  • ~10 are complex (architecture, hard problems)

Direct Opus on everything:
100 calls × $0.55 avg = $55/day → ~$1,650/month

With routing:
60 × $0.006 + 30 × $0.10 + 10 × $0.55 = $0.36 + $3.00 + $5.50 = $8.86/day → ~$265/month

That's an 84% cost reduction on the same workload, with Opus still being used for everything that needs it.

These numbers assume "average" Komilion pricing. Your actual split depends on your workflow — heavier architectural work means more premium routing. You can check data["komilion"]["cost"] in each response to see exactly what each call cost.


Staying on Opus when you need it

If you're in a complex session and want to ensure Opus handles everything, switch the model string in your Claude Code config temporarily:

"openAiModelId": "neo-mode/premium"
Enter fullscreen mode Exit fullscreen mode

This is the workflow: use balanced as your default, switch to premium for the hard sessions, drop back to balanced when you're done.


What about the January ban?

If you're reading this because your Claude Code setup broke in January 2026: the enforcement blocked the subscription OAuth token path, not the API.

Claude Code running via a direct API key (either Anthropic's or an OpenAI-compatible endpoint) was never restricted. If you switched to an API key-based setup after January, you're on the right path — this article is just about optimising costs once you're there.

Full migration context: komilion.com/cline


Quick setup

  1. Get an API key at komilion.com ($5 free credits, no card)
  2. Configure Claude Code (see JSON config above — apiProvider: openai approach is confirmed working) (Padme: ANTHROPIC_BASE_URL env var approach needs verification — use JSON config until confirmed)
  3. Use neo-mode/balanced as your default model
  4. Switch to neo-mode/premium for heavy architecture sessions

Claude Code connects the same way. Your bill changes immediately.

Top comments (0)