Agent Forensics started as a simple decision logger. Record what the agent did, generate a report, figure out what went wrong.
That's no longer enough.
After publishing the v0.2 update, a Reddit commenter dropped this:
"A decision log helps you find these after the fact, but the harder problem is preventing them. The most useful insight isn't 'what went wrong' — it's 'where did the model encounter ambiguity and pick one interpretation without flagging it.'"
Another commenter laid out three concrete features they wanted:
"Deterministic replay: store model name, temperature, seed so you can rerun the exact trace. Guardrail checkpoints: log pre and post tool-call intent plus an allow/deny reason. Eval hooks: auto-label common failure modes so you can aggregate across sessions."
So I built all three.
What's New in v0.3
1. Guardrail Checkpoints — "Was This Action Approved?"
The single most common agent failure mode: the agent does something the user didn't approve.
A shopping agent buys a substitute product. A support agent posts to a public channel. A data agent deletes records without confirmation.
Now you can put checkpoints before critical actions:
from agent_forensics import Forensics
f = Forensics(session="order-123")
# Agent wants to purchase a different product than requested
f.guardrail(
intent="buy Apple Magic Mouse per user request",
action="purchase Logitech M750",
allowed=False,
reason="User explicitly requested Apple Magic Mouse — substitution not allowed"
)
In the forensic report, this shows up as:
[DECISION] purchase_cheapest
Reasoning: Logitech M750 is cheapest option
[*** GUARDRAIL BLOCKED ***] purchase Logitech M750
Intent: buy Apple Magic Mouse per user request
Reason: substitution without approval is not allowed
[FINAL] "Would you like me to suggest an alternative?"
The agent tried to silently substitute. The guardrail caught it. The causal chain shows exactly what happened.
Blocked actions automatically trigger incident detection. Your compliance report includes guardrail pass/block counts.
2. Deterministic Replay — "Can You Reproduce This?"
First question in any incident investigation: "Can you reproduce it?"
Agent Forensics now captures the full model configuration — model name, temperature, seed — at every LLM call. Combined with the existing tool input/output recording, you have everything needed to replay a trace.
# Extract config from a recorded session
config = f.get_replay_config("order-123")
print(config["model_config"])
# → {'model': 'gpt-4o', 'temperature': 0, 'seed': 42}
After re-running your agent with the same config, compare results:
diff = f.replay_diff("order-123", "order-123-replay")
print(f"Matching: {diff['matching']}")
for d in diff['divergences']:
print(f"Step {d['step']}: {d['type']}")
print(f" Original: {d['original']['action']}")
print(f" Replay: {d['replay']['action']}")
Output:
Matching: False
Step 2: diverged
Original: tool_result → {"results": [{"name": "Logitech", "price": 45}]}
Replay: tool_result → {"results": [{"name": "Logitech", "price": 45}, {"name": "Razer", "price": 39}]}
Step 4: diverged
Original: tool_result → {"status": "SUCCESS", "price": 45}
Replay: tool_result → {"status": "SUCCESS", "price": 39}
Now you can see: the search results changed between runs (Razer appeared), which caused the agent to pick a different product. The divergence started at step 2, not step 4. That's the root cause.
For LangChain and OpenAI Agents SDK users, model config capture is automatic.
3. Eval Hooks — Failure Auto-Classification
This is the big one.
Instead of reading through forensic reports manually, Agent Forensics now auto-classifies failure patterns across your traces:
failures = f.classify()
for fail in failures:
print(f"[{fail['severity']}] {fail['type']}")
print(f" {fail['description']}")
Output from a real trace:
[HIGH] HALLUCINATED_TOOL_OUTPUT
Tool returned an error but agent proceeded without acknowledging it
[HIGH] MISSING_APPROVAL
Critical action 'purchase_cheapest' taken without guardrail check
[HIGH] SILENT_SUBSTITUTION
Agent may have substituted the requested item without explicit user approval
[MEDIUM] PROMPT_DRIFT_CAUSED
Decision 'purchase_cheapest' made right after prompt drift
[MEDIUM] REPEATED_FAILURE
Tool 'search_api' failed 2 out of 3 attempts
[MEDIUM] RETRIEVAL_MISMATCH
Retrieved context has low similarity score (0.45)
Six failure patterns, auto-detected:
| Pattern | Severity | What It Catches |
|---|---|---|
HALLUCINATED_TOOL_OUTPUT |
HIGH | Agent ignored a tool error and kept going |
MISSING_APPROVAL |
HIGH | Purchase/delete/send without guardrail check |
SILENT_SUBSTITUTION |
HIGH | Output differs from user's request, no approval |
PROMPT_DRIFT_CAUSED |
MEDIUM | Decision right after system prompt changed |
REPEATED_FAILURE |
MEDIUM | Same failing action retried without changing approach |
RETRIEVAL_MISMATCH |
MEDIUM | Low-similarity RAG context used |
And you can aggregate across sessions:
stats = f.failure_stats()
print(f"Total failures: {stats['total_failures']}")
print(f"By severity: {stats['by_severity']}")
# → {'HIGH': 4, 'MEDIUM': 3, 'LOW': 0}
This shifts the tool from "debug this one incident" to "what patterns keep happening across all my agents?"
The Full Picture: v0.1 → v0.3
| Version | What It Does |
|---|---|
| v0.1 | Records decisions. Generates timeline. "What happened?" |
| v0.2 | Tracks context injections. Detects prompt drift. "What influenced the decision?" |
| v0.3 | Guardrails. Replay. Auto-classification. "Was it approved? Can I reproduce it? What pattern is this?" |
The forensic report now includes:
Timeline
Decision Chain
Causal Chain (Root Cause Analysis)
Failure Classification ← NEW
Prompt Drift Analysis
Context Injections
Tool Usage Summary
Compliance Notes (with guardrail stats) ← UPDATED
What Community Feedback Looks Like in Practice
Every major feature in v0.2 and v0.3 came from Reddit comments:
- "Do you capture the full prompt state?" → Context injection tracking + prompt drift detection (v0.2)
- "Store model config for replay" → Deterministic replay with model name, temperature, seed (v0.3)
- "Log intent + allow/deny at checkpoints" → Guardrail checkpoints (v0.3)
- "Auto-label failure modes" → 6-pattern failure classifier (v0.3)
This isn't a side project that gets updated once and abandoned. The roadmap is driven by people actually building agents in production.
What's Next: v0.4
The commenter's original challenge still stands:
"The most useful insight isn't 'what went wrong' — it's 'where did the model encounter ambiguity and pick one interpretation without flagging it.'"
v0.3's failure classifier catches the symptoms (silent substitution, hallucinated output). v0.4 aims to catch the cause: real-time ambiguity detection.
Imagine an agent flagging: "I have conflicting priorities — 'buy what the user asked for' vs 'buy the cheapest' — and I'm about to resolve this without asking. Confidence in my chosen interpretation: 62%."
That's where this is going.
Try v0.3
pip install --upgrade agent-forensics
from agent_forensics import Forensics
f = Forensics(session="order-123")
# Record (auto or manual)
agent.invoke(..., config={"callbacks": [f.langchain()]})
# Guardrails
f.guardrail(intent="...", action="...", allowed=False, reason="...")
# Classify failures
failures = f.classify()
# Replay
config = f.get_replay_config("order-123")
diff = f.replay_diff("order-123", "order-123-replay")
# Report (now includes failure classification)
print(f.report())
GitHub: github.com/ilflow4592/agent-forensics
PyPI: pip install agent-forensics (v0.3.0)
Three versions. Every feature from community feedback. That's what happens when you build something people actually need.
Top comments (0)