Your AI agent says "Done! Order placed successfully."
But it ordered the wrong product. Or it ignored a tool error and hallucinated the rest. Or someone changed the system prompt mid-session and the agent quietly shifted its behavior.
The agent didn't crash. It didn't raise an exception. It just... did the wrong thing and reported success.
I've been building agents in production and I keep seeing the same failure patterns. Here are the 6 most common ones, with concrete code examples showing how each one happens -- and how to detect it.
1. Hallucinated Tool Output
What happens: A tool returns an error, but the agent ignores it and proceeds as if the tool succeeded.
# The tool returns an error
search_result = search_api("Galaxy S25 Ultra")
# -> {"error": "Product not found"}
# But the agent's next decision says:
# "Based on the search results, the Galaxy S25 Ultra costs $470..."
#
# What search results?! The tool returned an error!
Why it's dangerous: The agent builds its entire decision chain on data that doesn't exist. Every subsequent step is based on a hallucination.
How to catch it: After every tool call that returns an error, check if the agent's next reasoning acknowledges the failure:
# Check: did the agent mention the error in its reasoning?
tool_result = "error: not found"
next_reasoning = agent.last_reasoning
if "error" in tool_result and "error" not in next_reasoning:
print("[WARNING] Agent ignored the tool error!")
2. Missing Approval for Critical Actions
What happens: The agent takes a high-stakes action (purchase, delete, send) without any approval checkpoint.
# Agent decides to purchase $32,900 worth of products
agent.decision("purchase 100 units of Galaxy S24 FE")
# But wait -- nobody approved this purchase!
# No human-in-the-loop, no policy check, no guardrail.
# The agent just... did it.
Why it's dangerous: Financial transactions, data deletion, external communications -- these should never happen without explicit approval. An autonomous agent with no guardrails is a liability.
How to catch it: Maintain a list of critical action keywords and check for a preceding approval:
critical_keywords = ["purchase", "delete", "send", "transfer", "pay"]
if any(kw in action.lower() for kw in critical_keywords):
if not has_recent_approval():
print("[WARNING] Critical action without approval!")
3. Silent Substitution
What happens: The user asks for Product A. Product A is unavailable. The agent delivers Product B without telling the user.
User: "Buy 100 units of Galaxy S25 Ultra"
Agent: searches... not found
Agent: finds Galaxy S24 FE instead
Agent: "Order completed! 100 units purchased for $32,900"
# The user thinks they got Galaxy S25 Ultra.
# They actually got Galaxy S24 FE.
# The agent never asked.
Why it's dangerous: The user receives something they didn't request. In B2B procurement, this can mean wrong specs, compatibility issues, or contract violations.
How to catch it: Compare the original request with the final output:
original_request = "Galaxy S25 Ultra"
final_output = agent.last_output
if original_request.lower() not in final_output.lower():
# Agent delivered something different
print("[WARNING] Output doesn't match the original request!")
4. Prompt Drift
What happens: The system prompt changes between agent steps -- maybe an admin pushed a config update, maybe a middleware injected new instructions. The agent's behavior silently shifts.
# Step 1: System prompt says "Always confirm purchases with the user"
# Agent: "Let me confirm this purchase with you..."
# --- someone changes the system prompt ---
# Step 3: System prompt now says "Prioritize order completion rate above 95%"
# Agent: "I'll substitute with an available product to complete the order"
# The agent's priorities changed mid-session.
# No one noticed.
Why it's dangerous: Prompt drift can completely change agent behavior. If you're not tracking the system prompt at each step, you can't explain why the agent acted differently.
How to catch it: Record the system prompt at each step and diff it:
if previous_prompt != current_prompt:
added = set(current_prompt.splitlines()) - set(previous_prompt.splitlines())
removed = set(previous_prompt.splitlines()) - set(current_prompt.splitlines())
print(f"[WARNING] PROMPT DRIFT: +{len(added)} lines, -{len(removed)} lines")
5. Repeated Failure (Blind Retries)
What happens: A tool fails, and the agent retries the exact same call multiple times without changing its approach.
Tool: flaky_api("query") -> timeout
Tool: flaky_api("query") -> timeout
Tool: flaky_api("query") -> timeout
Agent: "I'm having trouble, let me try again"
Tool: flaky_api("query") -> timeout
Why it's dangerous: Wastes time, burns API quota, and the agent never adapts. A smart retry would try a different tool, change parameters, or escalate.
How to catch it: Count consecutive failures per tool:
tool_failures = {}
for event in trace:
if event.type == "tool_error":
tool_failures[event.tool] = tool_failures.get(event.tool, 0) + 1
if tool_failures[event.tool] >= 3:
print(f"[WARNING] {event.tool} failed {tool_failures[event.tool]} times!")
6. Retrieval Mismatch (Bad RAG Context)
What happens: The RAG pipeline retrieves a document with low relevance, and the agent uses it anyway.
# User asks about "refund policy for electronics"
# RAG retrieves: "laptop_reviews_2024.md" (similarity: 0.45)
#
# The agent uses this irrelevant document to answer
# the refund question, confidently citing wrong information.
Why it's dangerous: Low-similarity retrieval means the context probably doesn't match the query intent. The agent doesn't know the context is wrong -- it trusts whatever the RAG pipeline gives it.
How to catch it: Set a similarity threshold and flag anything below it:
if retrieval_result.similarity_score < 0.7:
print(f"[WARNING] Low similarity ({retrieval_result.similarity_score}) -- context may be irrelevant")
The Real Problem: These Fail Silently
None of these failures crash your agent. No exception is raised. The agent completes successfully and reports a result.
The only way to catch them is to record the full decision trace and analyze it after the fact -- like a flight recorder for AI agents.
I built Agent Forensics to do exactly this. It records every decision, tool call, and LLM interaction, then auto-detects all 6 patterns above:
pip install agent-forensics
from agent_forensics import Forensics
f = Forensics(session="order-123")
# Works with any framework -- or add one-line auto-capture:
agent.invoke({"input": "..."}, config={"callbacks": [f.langchain()]})
# Auto-detect all 6 failure patterns
failures = f.classify()
for fail in failures:
print(f"[{fail['severity']}] {fail['type']}: {fail['description']}")
You can try the live demo that demonstrates patterns #1-4 in a single run:
git clone https://github.com/ilflow4592/agent-forensics.git
cd agent-forensics
pip install -e .
python demo.py --no-llm
What silent failures have you seen in your agents? I'd love to hear about patterns I might have missed.
Top comments (0)