HTTP 200 Is Not a Product Guarantee

#llm #ai #devops #monitoring

HTTP 200 Is Not a Product Guarantee

AI Agents in Production - Series 2, Article 5 of 6

An AI agent ran 47 times last week.

Every run returned HTTP 200. Every run had latency under 2 seconds. No exceptions. No errors in the logs.

And every run produced absolutely nothing.

output_tokens: 0. Forty-seven times in a row.

The infrastructure saw success. The product saw nothing. And no alert fired.

The Failure Class Nobody Monitors

Most teams monitor what the API says. HTTP 200 means the request was accepted, processed, and returned cleanly. That is true. The infrastructure worked perfectly.

But HTTP 200 only tells you about the transport layer. It says nothing about what your agent was supposed to produce.

This is the failure class called silent failure: the system reports success while the business value goes to zero.

It is the most dangerous failure type because monitoring dashboards show green, on-call does not get paged, and the client has no idea for days or weeks.

Three Failure Shapes That Look Like Success

Shape 1 - The Empty Responder

The agent calls the LLM, gets back an empty completion. choices[0].message.content is empty. HTTP 200. output_tokens: 0. Your pipeline continues as if everything worked.

Shape 2 - The Stuck Safety Filter

The model triggers an internal safety classification and returns an empty response rather than a refusal. No error. Just silence.

Shape 3 - The Token Drain

Prompt tokens go up, output tokens stay at zero. You are paying for every prompt, generating nothing. All three return HTTP 200.

What You Actually Need to Monitor

`python
from opsveritas import AgentTracer

tracer = AgentTracer(api_key="your-key")

with tracer.trace("content-generator") as span:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages
)

content = response.choices[0].message.content
output_tokens = response.usage.completion_tokens

span.set_output(
    summary=content[:500] if content else "[EMPTY - silent failure]",
    output_tokens=output_tokens,
    success=bool(content and output_tokens > 0)
)

Success is not http_status == 200. Success is output_tokens > 0 AND the output contains meaningful content.

`javascript
import { AgentTracer } from "@opsveritas/sdk";

const tracer = new AgentTracer({ apiKey: "your-key" });

const span = tracer.start("content-generator");
try {
const response = await openai.chat.completions.create({ ... });
const outputTokens = response.usage.completion_tokens;

span.finish({
outputTokens,
outputSummary: response.choices[0].message.content || "[EMPTY]",
success: outputTokens > 0
});
} catch (err) {
span.error(err);
}
`

The Alert That Should Have Fired

json { "alert_type": "silent_failure", "workflow_name": "content-generator", "message": "output_tokens = 0 on 47 consecutive runs. HTTP 200 returned each time.", "diagnosis": "Likely: safety filter on prompt, empty completion, or context window exhaustion.", "consecutive_empty_runs": 47, "cost_usd_burned": 0.22 }

Note the diagnosis field: AI-generated root cause analysis, not just a raw metric.

What This Changes

When you monitor output_tokens alongside HTTP status, silent failures get caught in the first run, not the 47th. You stop paying for token consumption that produces nothing. Client-facing failures get resolved before the client notices.

HTTP 200 is a transport guarantee. Build your monitoring at the product layer.

Free to try: https://agents.opsveritas.com

We also build these end-to-end: https://opsveritas.com

DM for a 15-min demo if you are running agents in production.