Everyone says: use caching, use cheaper models, reduce token counts. Here is the one thing that actually cuts LLM costs by 40%.
The Real Problem
Most LLM cost optimization advice focuses on the wrong thing: the cost per call.
But the real cost driver is: making calls you do not need to make.
The 40% Solution
Track which LLM calls are producing valuable outputs vs. which are being ignored.
I added output engagement tracking to my monitoring. Here is what I found:
- 35% of LLM outputs were never read by the user
- Another 15% were read but immediately dismissed
- Only 50% drove actual user actions
That is 50% of LLM costs producing zero value.
The Fix
Add a simple check: did the user act on the output?
def track_output_value(response, user_action):
log({
"response_id": response.id,
"user_action": user_action, # clicked, copied, dismissed, ignored
"tokens": response.usage.total_tokens,
"cost": calculate_cost(response)
})
If user_action == "ignored", that call was wasted.
The Result
After filtering out ignored outputs:
- LLM calls reduced by 40%
- Cost reduced by 40%
- User satisfaction unchanged (they were ignoring the outputs anyway)
What This Means
Before optimizing prompts, models, or caching: track whether outputs matter.
If users are ignoring your LLM outputs, you are wasting money regardless of how efficiently you generate them.
The first step to cost optimization is understanding what you are getting for what you pay. Track everything.
Top comments (0)