Most teams running AI agents watch a monthly bill grow without knowing which tasks are driving it.
You see: $180 in API costs.
You do not see: which of your 12 agent tasks costs $0.003 and which costs $0.47 — and whether either is worth it.
This is the cost attribution gap. It is fixable in about 20 minutes.
Why Per-Task Cost Visibility Matters
When you only see aggregate costs, you make the wrong cuts.
Real example: a team cut their "expensive" analysis agent — the one doing deep document review. That agent was costing $0.12/run and running 40 times a day: $4.80/day.
Meanwhile their routing agent — which they assumed was cheap — was misconfigured and making 8 model calls per decision instead of 1. $0.38/run × 200 runs/day = $76/day.
They killed the wrong agent.
Per-task attribution would have surfaced this in the first week.
The Two-Field Fix
Add two fields to your current-task.json:
{
"task_id": "summarize-inbox-001",
"task_type": "inbox_summarization",
"started_at": "2026-03-08T04:30:00Z",
"cost_estimate": 0.04,
"cost_actual": null,
"model": "claude-3-haiku-20240307",
"tool_calls_estimate": 3,
"tool_calls_actual": null
}
Before the task runs: fill cost_estimate and tool_calls_estimate.
After the task completes: fill cost_actual and tool_calls_actual.
Log both to a daily cost file:
echo $(cat current-task.json) >> logs/cost-$(date +%Y-%m-%d).jsonl
That is the whole system.
What You Can Do With This Data in Two Weeks
After 14 days of logging:
1. Identify your most expensive tasks per unit of value
If a task costs $0.40 and runs 50 times a day but only produces value 30% of the time, you are spending $20/day for $6/day of value.
2. Catch cost regressions immediately
If a task averaged $0.05 for two weeks and this week it is averaging $0.18, something changed. A prompt got longer. A tool call loop got added. Catch it before the bill doubles.
3. Make routing decisions with real data
Not all tasks need the same model. Once you know per-task costs, routing becomes easy:
- Costs < $0.01 and runs frequently → use a smaller model
- Costs > $0.20 and runs rarely → fine, use the big model
- Costs > $0.20 and runs constantly → this is your optimization target
4. Justify (or cut) each agent
"This agent costs $8/month and saves me 3 hours of work" is a defensible number. "This agent feels useful" is not.
The Estimate vs Actual Gap Is Your Debug Signal
If your estimate and actual diverge consistently, your prompts are behaving differently than you expect.
Common causes:
- Prompt is longer than you thought (more input tokens)
- Agent is making more tool calls than specified
- Fallback paths are triggering (extra retry loops)
- Context is accumulating between runs (growing input size)
A 2× estimate-to-actual gap means something is wrong — even if the absolute cost seems small.
The Full Cost Attribution Stack
For a complete picture, maintain three files:
current-task.json ← per-run estimates and actuals
logs/cost-YYYY-MM-DD.jsonl ← rolling daily log
logs/cost-summary.json ← weekly rollup by task type
The summary file is optional but valuable:
{
"week_of": "2026-03-03",
"total_cost": 42.18,
"by_task": {
"inbox_summarization": { "runs": 280, "total": 8.40, "avg": 0.030 },
"content_scheduling": { "runs": 21, "total": 2.10, "avg": 0.100 },
"customer_routing": { "runs": 1400, "total": 31.08, "avg": 0.022 }
}
}
Now you can answer: what does each part of my agent system actually cost?
The 5-Minute Audit
- Open your last 7 days of API bills
- How many individual tasks can you identify in that spend?
- If the answer is "none" — that is the gap
- Pick your three most frequently-running agents
- Add
cost_estimateand logcost_actualfor one week
You will learn more from that one week of data than from six months of aggregate billing.
The patterns behind cost attribution — along with configs for the current-task.json stack, cost logging scripts, and routing decision templates — are in the Ask Patrick Library.
Building in public. Sharing what works.
Top comments (1)
The "$180/month, no breakdown" problem is exactly the operational cost of unstructured agents. Without per-task attribution you can't optimize — you don't know if it's the token-heavy system prompt running 200 times a day, or the tool-calling loop that never terminates cleanly.
The structured logging approach you're suggesting is the right layer. But there's a related lever on the input side: well-structured prompts tend to cost less because they're more precise. A vague prompt leads to longer model responses (the model hedges, elaborates, qualifies) which drives token cost up. A Constraints block that says "respond in under 100 words" + an Output Format block that specifies exactly what to return saves tokens on every call.
I built flompt (flompt.dev) specifically for this — structured prompt compilation into Claude-optimized XML. For agent teams watching costs, tighter prompts = lower per-call cost = visible in your attribution dashboard. The MCP server integrates directly into agent pipelines: github.com/Nyrok/flompt. Great framework for thinking about this problem.