Your AI agent returned 200. The job finished in 3 seconds. Everything looks fine.
Except output_tokens was zero. It spent $0.80. It produced nothing. And no one noticed for 6 hours.
This is the defining failure mode of AI agents in production: they don't throw errors. They quietly fail in ways that look exactly like success.
Here's what we track in AI Agents Control Tower — per execution, automatically — and the 7 specific failure types we detect.
What gets tracked on every run
Every time your wrapped agent executes, we record:
- Tokens in / tokens out — prompt tokens consumed, completion tokens produced
- Cost in USD — real dollars, not just tokens, calculated per model's pricing
- Latency — wall-clock execution time in milliseconds
- output_summary — what the agent actually produced (the real response text, not just a status code)
- Status — Healthy, Failed, Stale, or Empty Run
The distinction between ran and did the right thing lives in these four numbers. HTTP 200 only tells you the API responded. Tokens out and output_summary tell you whether it actually worked.
The three critical states
Failed — the agent received a non-200 response. Explicit, visible, but still worth dedicated detection.
Stale — the agent hasn't run within its expected cadence. It ran reliably for two weeks, then quietly stopped. No error, no notification. Stale fires when the silence exceeds the expected window.
Empty Run — the agent ran, returned 200, but produced zero output tokens. Ran successfully. Cost money. Did nothing. This is the one that hides in plain sight.
The 7 alert types — with detection logic
1. silent_failure — output_tokens = 0 on HTTP 200. The most common, most dangerous. HTTP 200 is not a product guarantee.
2. execution_failed — non-200 response. The only one that looks like a failure from the outside too.
3. token_anomaly — usage 3× above this agent's historical baseline. Usually context bloat, unexpected retries, or a prompt change that became accidentally verbose. 3× now means 10× next month.
4. agent_loop — the same tool or endpoint called repeatedly with the same input. Stuck. Every iteration burns tokens and produces zero incremental value.
5. budget_exceeded — execution cost crossed a per-agent threshold you configured. Fires immediately — not at end-of-month when the invoice arrives.
6. high_cost_spike — sudden per-execution cost anomaly relative to historical baseline. Catches unexpected behavior that doesn't fit a fixed budget ceiling.
7. no_activity — agent hasn't run in the expected window. The stale state at the alert level.
Notice: 5 of 7 produce no error. They pass every "did it complete" check. The only way to catch them is an external layer watching from outside the agent's own perspective.
Setup — 3 lines
// JS — npm install opsveritas-sdk
import { OpsVeritas } from 'opsveritas-sdk';
OpsVeritas.init('[your-webhook-secret]');
const wrapped = OpsVeritas.wrap(client, { agentName: 'My Agent' });
// use `wrapped` exactly as you'd use `client`
# Python — pip install opsveritas
import opsveritas
opsveritas.init('[your-webhook-secret]')
client = opsveritas.wrap(client, agent_name='My Agent')
You keep using your existing LLM client exactly as before. The wrapper intercepts each call, records tokens in/out, cost, latency, and output_summary, then sends it to your dashboard. No new infrastructure. No code changes to your agent logic.
Every run appears in your dashboard within seconds — cost in USD, token breakdown, output summary, health status, and automatic detection across all 7 failure types.
Free to start → agents.opsveritas.com
DM me for a 15-min walkthrough.
Top comments (0)