Your AI Agent Spent $500 Overnight and Nobody Noticed

#ai #python #llm #monitoring

Friday 5 PM. You deploy a research agent that processes customer tickets. It calls GPT-4 for each one. Expected load: 200 tickets a day, about $8 in API costs.

Friday 11 PM. A bug in ticket deduplication. The agent reprocesses the same tickets in a loop. Each iteration makes 4 LLM calls at $0.03 each. The loop runs 50 times per hour.

Saturday 3 AM. The agent has made 12,000 LLM calls. Cost so far: $360. Nobody is watching.

Monday 9 AM. OpenAI billing alert fires at the $500 threshold you set months ago. Total damage: $487. No logs showing which agent caused it, which task triggered the loop, or when it started.

This is not hypothetical. Every team running AI agents in production has a version of this story.

Why Standard Monitoring Doesn't Help

OpenAI gives you total organization spend. Not per-agent. Not per-task. Not in real time.

If you have 5 agents calling GPT-4, and one goes haywire, your OpenAI dashboard shows a line going up. Which agent? You don't know. Which task caused the spike? You don't know. When did it start? You can guess from the slope of the graph.

Cloud monitoring (Datadog, Grafana) tracks CPU and memory. It doesn't know about LLM tokens. You could instrument it yourself - custom metrics, Prometheus counters, StatsD gauges - but now you're building a cost monitoring system instead of building your product.

Billing alerts are too late and too coarse. A $500 alert tells you the money is already gone. A per-API-key alert doesn't map to individual agents.

What you actually need:

Cost per agent, in real time
Cost per task (not just per agent)
Budget limits that actually stop the agent
Alerts before the damage is done

Tracking Cost Through the Agent Heartbeat

AXME agents send heartbeats every 30 seconds - standard health reporting. The insight is that cost is just another metric in that heartbeat.

from axme import AxmeClient, AxmeClientConfig

client = AxmeClient(AxmeClientConfig(api_key=os.environ["AXME_API_KEY"]))

def call_llm(prompt: str) -> str:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
    )

    # Calculate cost from token usage
    tokens_in = response.usage.prompt_tokens
    tokens_out = response.usage.completion_tokens
    cost_usd = (tokens_in * 0.03 + tokens_out * 0.06) / 1000

    # Report cost alongside the regular heartbeat
    client.mesh.report_metric(cost_usd=cost_usd)

    return response.choices[0].message.content

Two lines of actual logic: calculate the cost, report it. The gateway accumulates it per agent, per intent, per time window. No Prometheus setup. No custom Datadog metrics. No StatsD.

Budget Limits That Actually Stop Agents

Reporting cost is useful. Enforcing limits is essential.

# Set cost policy via API
PUT /v1/mesh/agents/{address_id}/policies/cost
{
    "max_intents_per_day": 500,
    "max_cost_per_day_usd": 50.00,
    "max_intents_per_hour": 100,
    "action_on_breach": "block"
}

When the research agent hits $50 for the day, the gateway blocks new intents with HTTP 429. Not after $500. Not after the invoice. At $50, in real time.

You can also set this from the dashboard at mesh.axme.ai - select the agent, set cost limits, save.