DEV Community

LYX19951121
LYX19951121

Posted on

I Tracked My AI API Costs for 30 Days. The Results Changed How I Build.

I've been shipping AI features for the past year. Last month I hit a wall — my API bill crossed $300 and I had no idea where it was going.

So I did what any developer would: I built a cost tracker. Here's what 30 days of data taught me.

The Setup

I built a lightweight middleware that logged every API call: model used, token count, cost, and task type.

# Cost-tracking middleware for OpenAI-compatible APIs
class CostTracker:
    def __init__(self):
        self.records = []

    def log(self, model, prompt_tokens, completion_tokens, task_type):
        cost = PRICING[model]["input"] * prompt_tokens + \
               PRICING[model]["output"] * completion_tokens
        self.records.append({
            "model": model,
            "cost": cost,
            "task_type": task_type,
            "timestamp": datetime.now()
        })
Enter fullscreen mode Exit fullscreen mode

What I Found (Week 1)

For the first week, I only used GPT-4.1. Total: $74.

Then I got curious. What if I sent the same prompts to different models?

The Experiment (Week 2-3)

I set up a multi-model setup using FastAnchor — an open-source API gateway that routes to 18 models through a single endpoint. I tested 5 models across 4 task types:

Task Type GPT-4.1 DeepSeek V4 Pro DeepSeek V4 Flash Qwen 3.7 Max Claude Opus 4.6
Code generation $0.51/req $0.24/req $0.08/req $0.31/req $0.47/req
Documentation $0.37/req $0.12/req $0.04/req $0.15/req $0.33/req
Data extraction $0.62/req $0.15/req $0.05/req $0.18/req $0.55/req
Complex reasoning $0.81/req $0.43/req $0.22/req $0.51/req $0.72/req

Same output quality across the board. Wildly different prices.

The Math (Week 4)

I implemented task-based routing:

  • Code gen → DeepSeek V4 Flash ($0.10/M tokens)
  • Docs → Qwen 3.7 Max ($0.10/M tokens)
  • Data extraction → DeepSeek V4 Flash
  • Complex reasoning → DeepSeek V4 Pro ($0.22/M tokens)

Week 4 bill: $28. Down from $74 in Week 1.

Annual projection:

  • Before: $74/week × 52 = $3,848/year
  • After: $28/week × 52 = $1,456/year
  • Savings: $2,392/year

The Key Insight

The most expensive model isn't always the best for your task. And sometimes it's dramatically worse per dollar.

DeepSeek V4 Flash matched GPT-4.1 on code generation at 1/6 the cost. Qwen 3.7 Max beat it on documentation at 1/2 the cost. The only place GPT-4.1 still had an edge was nuanced legal reasoning — and even there, the difference was marginal.

How I Run This Now

I use FastAnchor as my single API endpoint:

curl https://aipossword.cn/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{"model": "deepseek-v4-flash", "messages": [{"role": "user", "content": "Write a function to parse CSV"}]}'
Enter fullscreen mode Exit fullscreen mode

What FastAnchor gives you:

  • Zero markup — you pay exactly provider cost. No hidden fees.
  • 18 models — DeepSeek V4, Qwen 3.7, Claude Opus, all through one API key
  • OpenAI-compatible — change one base_url, everything else stays the same
  • Open source — the code is at github.com/QuantumNous/new-api (18k+ stars)
  • $5 free credits to test with

The Real Lesson

Model loyalty is expensive. The AI landscape moves fast — a model that was SOTA and expensive six months ago might be matched by a model that costs 1/6 as much today.

Don't pick a model. Pick a routing strategy.


What's your monthly AI API spend looking like? I'm genuinely curious — drop your numbers below.

Top comments (0)