I've been looking at LLM billing patterns lately, and there's a silent killer that creeps up on almost every team: prompt inflation.
When you first build an AI feature, your prompt is tight. Maybe 500 tokens for the system instructions and 100 for the user query. The math looks great. "This will cost us fractions of a cent per call," you tell the team.
Fast forward three months.
Someone added conversation history to make the bot "smarter." Another dev added a massive RAG context block because the model hallucinated once. Product asked for formatting instructions, so now the system prompt is a 2,000-word essay.
Suddenly, your baseline request is 8k tokens.
The worst part is that user value doesn't scale linearly with prompt size. But your OpenAI bill sure does. If you're running at scale, you're suddenly paying $0.05+ per request for a feature you modeled at $0.005.
If you just look at your monthly total on the provider dashboard, it just looks like you're getting more usage. You think "growth is good" until the Stripe payout hits and you realize your margins are gone.
You need to track cost per user and cost per feature, not just total spend. If you see specific users driving crazy costs, they're probably accumulating massive context windows that you need to truncate.
fwiw, I ran into this exact issue, which is why I built LLMeter (https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=2026-04-21-prompt-inflation-margin-killer). It's an open-source, proxy-free way to track this stuff. It attributes costs down to the user ID level so you can actually see who is dragging around a 10k token history.
Stop assuming your prompt is the same size it was on day one. Track it.
Top comments (0)