DEV Community

Discussion on: How I Found $1,240/Month in Wasted LLM API Costs (And Built a Tool to Find Yours)

Collapse
 
jon_at_backboardio profile image
Jonathan Murray

Caching is criminally underused for LLM calls. So many teams are re-sending identical or near-identical prompts and paying for it every time. The other big one is context window bloat - stuffing way more into the prompt than necessary because it feels safer. At $2k/month the gains from optimizing are real. Is your tool available publicly or still internal?

Collapse
 
buildwithabid profile image
Abid Ali

Totally agree on both — caching especially feels like something everyone knows they should do but never prioritizes until the bill hurts. The near-duplicate problem is sneaky too, exact duplicates are easy to cache but prompts that are 95% the same with a different user name or timestamp still hit the API fresh every time.

Yeah it's public, just pushed it last week — pip install llm-spend-profiler, repo at github.com/BuildWithAbid/llm-cost-profiler. Still early but it detects the main patterns: duplicate calls, retry waste, context bloat, and model downgrade opportunities. Would love to know what it finds on your codebase if you try it.