Friday 5 PM. You deploy a research agent that processes customer tickets. It calls GPT-4 for each one. Expected load: 200 tickets a day, about $8 in API costs.
Friday 11 PM. A bug in ticket deduplication. The agent reprocesses the same tickets in a loop. Each iteration makes 4 LLM calls at $0.03 each. The loop runs 50 times per hour.
Saturday 3 AM. The agent has made 12,000 LLM calls. Cost so far: $360. Nobody is watching.
Monday 9 AM. OpenAI billing alert fires at the $500 threshold you set months ago. Total damage: $487. No logs showing which agent caused it, which task triggered the loop, or when it started.
This is not hypothetical. Every team running AI agents in production has a version of this story.
Why Standard Monitoring Doesn't Help
OpenAI gives you total organization spend. Not per-agent. Not per-task. Not in real time.
If you have 5 agents calling GPT-4, and one goes haywire, your OpenAI dashboard shows a line going up. Which agent? You don't know. Which task caused the spike? You don't know. When did it start? You can guess from the slope of the graph.
Cloud monitoring (Datadog, Grafana) tracks CPU and memory. It doesn't know about LLM tokens. You could instrument it yourself - custom metrics, Prometheus counters, StatsD gauges - but now you're building a cost monitoring system instead of building your product.
Billing alerts are too late and too coarse. A $500 alert tells you the money is already gone. A per-API-key alert doesn't map to individual agents.
What you actually need:
- Cost per agent, in real time
- Cost per task (not just per agent)
- Budget limits that actually stop the agent
- Alerts before the damage is done
Tracking Cost Through the Agent Heartbeat
AXME agents send heartbeats every 30 seconds - standard health reporting. The insight is that cost is just another metric in that heartbeat.
from axme import AxmeClient, AxmeClientConfig
client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))
def call_llm(prompt: str) -> str:
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
)
# Calculate cost from token usage
tokens_in = response.usage.prompt_tokens
tokens_out = response.usage.completion_tokens
cost_usd = (tokens_in * 0.03 + tokens_out * 0.06) / 1000
# Report cost alongside the regular heartbeat
client.mesh.report_metric(
agent="agent://myorg/production/research-agent",
intent_id=current_intent_id,
cost_usd=cost_usd,
metadata={"model": "gpt-4", "tokens_in": tokens_in, "tokens_out": tokens_out},
)
return response.choices[0].message.content
Two lines of actual logic: calculate the cost, report it. The gateway accumulates it per agent, per intent, per time window. No Prometheus setup. No custom Datadog metrics. No StatsD.
Budget Limits That Actually Stop Agents
Reporting cost is useful. Enforcing limits is essential.
client.mesh.set_cost_policy(
agent="agent://myorg/production/research-agent",
rules=[
{
"period": "day",
"limit_usd": 50.00,
"action": "pause",
"notify": ["ops@company.com"],
},
{
"period": "intent",
"limit_usd": 5.00,
"action": "alert",
"notify": ["ops@company.com"],
},
],
)
Three actions: alert sends a notification and the agent continues. pause stops delivering new intents to the agent. kill terminates active intents immediately.
The gateway checks policies on every heartbeat. When the research agent hits $50 for the day, it pauses. Not after $500. Not after the invoice. At $50, in real time.
The per-intent limit catches the more subtle problem. If a single customer ticket costs $5 in LLM calls, something is wrong with that ticket or with how the agent is processing it. Alert before it becomes a pattern.
What the Dashboard Shows
The AXME mesh dashboard shows cost alongside agent status:
+---------------------+----------+----------+-----------+--------+
| Agent | Today | This Wk | This Mo | Status |
+---------------------+----------+----------+-----------+--------+
| research-agent | $12.47 | $47.82 | $189.30 | OK |
| support-agent | $8.23 | $31.05 | $142.18 | OK |
| onboarding-agent | $3.10 | $14.22 | $58.91 | OK |
| data-pipeline | $0.00 | $22.40 | $89.60 | PAUSED |
+---------------------+----------+----------+-----------+--------+
Drill into any agent and you see cost per intent, per model, per hour. The data-pipeline agent is paused because it hit its daily limit yesterday. No surprises.
Multi-Model Visibility
Most agents don't use just one model. They use GPT-4 for complex reasoning, GPT-4o for routine tasks, GPT-4o-mini for classification. Each has different pricing. Each needs separate tracking.
MODEL_COSTS = {
"gpt-4": {"input": 0.03, "output": 0.06},
"gpt-4o": {"input": 0.005, "output": 0.015},
"gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
}
def call_llm(model: str, prompt: str) -> str:
response = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
rates = MODEL_COSTS[model]
cost = (
response.usage.prompt_tokens * rates["input"]
+ response.usage.completion_tokens * rates["output"]
) / 1000
client.mesh.report_metric(
cost_usd=cost,
intent_id=current_intent_id,
metadata={"model": model},
)
return response.choices[0].message.content
The dashboard breaks it down:
research-agent cost breakdown (today):
gpt-4: $9.20 (73.8%)
gpt-4o: $2.85 (22.8%)
gpt-4o-mini: $0.42 (3.4%)
Now you can ask the right question: can we move some of those GPT-4 calls to GPT-4o-mini and cut costs by 60%?
The Alternative
Without this, you build it yourself:
- Instrument every LLM call with token counting
- Send custom metrics to Prometheus/Datadog/CloudWatch
- Build dashboards per agent (Grafana? Retool? Custom?)
- Write alerting rules with the right thresholds
- Build the "pause agent" mechanism yourself
- Map OpenAI costs to individual agents in your billing system
- Maintain all of this as models and pricing change
That is a real project. Weeks of work. And it is not your product.
Or: report cost_usd in the heartbeat your agent already sends. Set a policy. Done.
Try It
Working example with cost reporting, budget limits, and multi-model tracking:
github.com/AxmeAI/ai-agent-cost-monitoring
Built with AXME - agent coordination with durable lifecycle and cost controls. Alpha - feedback welcome.
Top comments (0)