Last month I looked at my API bill and realized I was burning tokens on context that didn't matter. Ten minutes of auditing cut my costs by nearly half.
Here's the exact checklist I use now.
The 10-Minute Token Audit
Minute 1-3: Measure Your Baseline
Before optimizing, know what you're spending. Check your last 7 days:
# If using OpenAI
curl https://api.openai.com/v1/usage \
-H "Authorization: Bearer $OPENAI_API_KEY" | jq '.daily_costs[-7:]'
# Quick estimate: count tokens in your typical prompt
echo "Your system prompt here" | wc -w
# Multiply by 1.3 for rough token count
Write down: average tokens per request, requests per day, daily cost.
Minute 3-5: Find the Bloat
The three biggest token wasters:
1. System prompts that repeat every call.
If your system prompt is over 500 tokens and you're making 50+ calls/day, that's 25,000+ tokens just on instructions the model already "knows."
Fix: Move static instructions to a cached system message (if your provider supports it) or trim ruthlessly.
2. Full file contents when a summary would do.
Don't paste an entire 2,000-line file when the AI only needs to understand the interface.
Fix: Send type signatures, function headers, or a 10-line summary instead of the full file.
3. Conversation history you never prune.
Every message in your chat history gets re-sent with each request. A 20-message conversation means the AI is re-reading the whole thread every turn.
Fix: Summarize older messages. Keep the last 3-5 exchanges verbatim, compress the rest.
Minute 5-8: Apply the Compression
For each prompt in your workflow, ask:
| Question | If Yes â |
|---|---|
| Is the system prompt >500 tokens? | Trim to essentials |
| Am I sending full files? | Send signatures/headers only |
| Is chat history >10 messages? | Summarize older messages |
| Am I sending the same context repeatedly? | Cache or reference it |
| Does the prompt include examples? | Keep max 2, remove the rest |
Minute 8-10: Verify Quality Didn't Drop
Run your three most common prompts with the trimmed versions. Compare outputs side by side. If quality dropped, add back the minimum context needed.
In my experience, you can remove 30-50% of tokens without any quality impact. The AI doesn't need your entire README to fix a typo.
Real Numbers
My workflow before the audit: ~8,000 tokens/request average.
After: ~4,500 tokens/request.
Same output quality, 44% cost reduction.
The biggest win was pruning conversation history. I was sending 15 messages of context for tasks that only needed the last 3.
Do This Monthly
Token bloat creeps back. System prompts grow. New files get added to context. I run this audit on the first of every month â takes 10 minutes, saves real money.
What's your biggest token waster? Curious if others are seeing the same patterns.
Top comments (0)