LLM prices dropped 80% — but are you actually saving money?

#ai #claude #openai #webdev

veryone is cheering about Anthropic and OpenAI dropping API prices by 80%.
It sounds great on Twitter. But if you look at your actual billing dashboard, your costs probably haven't moved that much.

Why? Because cheaper tokens usually just mean you start wasting more tokens.

Here is the thing:

1- Context bloat
When GPT-4 was expensive, we carefully truncated histories and compressed prompts. Now that it's cheap, devs just throw the entire 128k context window at it on every single retry. The cost per token dropped, but you are sending 10x more tokens per request.

2- Agent loops
Cheaper models make agentic workflows viable, but a poorly configured while loop can still burn through your budget in minutes. When an agent gets stuck and retries 40 times, cheaper tokens don't save you—you still bleed cash.

3- Lack of per-customer attribution
It's easy to see your total OpenAI bill. But if you don't know which specific tenant or user is driving the cost, you can't optimize it. You just eat the cost.

tbh, the raw price per token is only half the story. If you can't attribute the cost per-user or per-model, you're still flying blind.

fwiw I built LLMeter to fix this for my own projects. It tracks costs per model and per user, and sets budget alerts—without a proxy in the middle. It's open-source (AGPL).

Check it out if you're tired of guessing your AI bills: https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=2026-04-16-llm-prices-dropped-are-you-saving

Top comments (2)

Void Stitch • May 21

Strong framing. One implementation detail I keep tripping over: reservation budgets are a guardrail, but attribution is only real after usage events join back to the root workflow/request that triggered downstream model calls. How are you handling the join key across fan-out + retries so root cost rollups stay deterministic?

LEI GUO • May 25

ecomai.online - DeepSeek API, $1 trial, works from any country.