DEV Community

John Medina
John Medina

Posted on

Your prompt is getting longer without you knowing it (and it's killing your margins)

I've been looking at LLM billing patterns lately, and there's a silent killer that creeps up on almost every team: prompt inflation.

When you first build an AI feature, your prompt is tight. Maybe 500 tokens for the system instructions and 100 for the user query. The math looks great. "This will cost us fractions of a cent per call," you tell the team.

Fast forward three months.

Someone added conversation history to make the bot "smarter." Another dev added a massive RAG context block because the model hallucinated once. Product asked for formatting instructions, so now the system prompt is a 2,000-word essay.

Suddenly, your baseline request is 8k tokens.

The worst part is that user value doesn't scale linearly with prompt size. But your OpenAI bill sure does. If you're running at scale, you're suddenly paying $0.05+ per request for a feature you modeled at $0.005.

If you just look at your monthly total on the provider dashboard, it just looks like you're getting more usage. You think "growth is good" until the Stripe payout hits and you realize your margins are gone.

You need to track cost per user and cost per feature, not just total spend. If you see specific users driving crazy costs, they're probably accumulating massive context windows that you need to truncate.

fwiw, I ran into this exact issue, which is why I built LLMeter (https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=2026-04-21-prompt-inflation-margin-killer). It's an open-source, proxy-free way to track this stuff. It attributes costs down to the user ID level so you can actually see who is dragging around a 10k token history.

Stop assuming your prompt is the same size it was on day one. Track it.

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

This is the silent margin-killer nobody watches. Prompts grow by accretion, one more example, one more instruction, one more guardrail, and suddenly every call costs 3x for marginal quality. The insidious part is it's gradual, no single commit looks bad, so it never gets caught in review. The fix is treating prompt length as a budget you actually monitor: prune ruthlessly and measure quality-per-token, not just quality. Cheap deterministic context-trimming beats stuffing everything in and hoping. I watch this closely in Moonshift since token cost is basically the pricing model. Are you tracking prompt-token growth over time, or only noticing when the bill jumps?