DEV Community

sophiaashi
sophiaashi

Posted on

I Tracked Every API Call for 30 Days. 60% Were Wasting Money on the Wrong Model.

Decided to actually log every API call for a month. The results were embarrassing.

Out of roughly 3000 calls:

  • 1800 (60%) were simple: file reads, grep, reformatting, basic Q&A
  • 750 (25%) were medium: code refactors, test generation, summarization
  • 450 (15%) were complex: architecture decisions, multi-file debugging

I was sending ALL of them to Claude Sonnet at $15/million tokens.

The simple 60% runs identically on DeepSeek-V3 at $1.80/million. The medium 25% works fine on GPT-4o at $5/million. Only the complex 15% actually needs Sonnet.

Before and After

  • Before routing: ~$240/month (everything on Sonnet)
  • After routing by task type: ~$140/month
  • Saved: ~$100/month for zero quality loss

The Surprising Part

I had no idea 60% of my daily work was basically file reads and simple edits until I actually logged it. We all think we are doing complex reasoning all day, but the data says otherwise.

How I Automated It

Manually switching models was annoying. I use TeamoRouter to auto-pick the cheapest model per task. One API key, installs in OpenClaw in 2 seconds.

Routing modes:

  • teamo-balanced — auto-picks best value per task (my default)
  • teamo-best — always highest quality
  • teamo-eco — always cheapest

My Routing Config

I shared my full task-to-model routing table (with exact per-task costs) in our Discord. Too detailed to format in a blog post, but the short version:

Task Type Model Cost/1K tokens
File reads, grep DeepSeek-V3 $0.0014
Simple refactors DeepSeek-V3 $0.0014
Code review GPT-4o $0.005
Summarization Gemini Flash $0.0005
Architecture Claude Sonnet $0.015
Complex debugging Claude Sonnet $0.015

Curious if anyone else has done this exercise. Does the 60/25/15 split hold for you, or is your workflow different?

Top comments (0)