Most people blame model pricing for high AI bills.
In practice, the biggest leak is context bloat.
What I keep seeing in teams:
- giant pasted context that does not matter
- long threads reused for unrelated tasks
- retries with the same broken prompt
Simple fix that works:
1) start a fresh thread for each new task type
2) keep prompts scoped and short
3) track token usage while jobs run, not after invoice day
That single habit usually cuts spend fast without reducing output quality.
If you want live visibility from macOS menu bar, this is what I use:
https://www.tokenbar.site/
Top comments (1)
Context bloat is the hidden killer. Teams that trim system prompts and stale memory blocks usually see immediate cost drops.