Most AI agent monitoring tells you what happened. Almost none tells you why.
That's the observability gap — and it's why debugging agents feels like reading a novel with half the pages torn out.
What standard logs look like
{"timestamp": "2026-03-09T05:00:00Z", "action": "sell", "asset": "ETH", "qty": 1.5}
You can see the agent sold ETH. You have no idea why. When something goes wrong, you're reconstructing motive from timestamps and outcomes.
What observability logs look like
{
"timestamp": "2026-03-09T05:00:00Z",
"action": "sell",
"asset": "ETH",
"qty": 1.5,
"reasoning": "stop-loss triggered at -4.2% from entry",
"alternatives_considered": ["hold", "partial_sell_0.75"],
"why_rejected": "hold violates risk rule; partial insufficient to meet loss limit",
"confidence": 0.91
}
Now you can debug. Now you can audit. Now you can improve.
The three fields that matter most
1. reasoning — What caused this action? Reference the specific rule or condition.
2. alternatives_considered — What else could the agent have done? This surfaces whether the agent understood the option space.
3. why_rejected — Why didn't it pick the alternatives? This is where bugs often hide.
How to add this in SOUL.md
For every consequential action, write a structured log entry:
- action: what you did
- reasoning: what triggered it (cite the specific rule)
- alternatives_considered: what else you could have done
- why_rejected: why you didn't
- confidence: 0.0-1.0
Write to logs/action-log.json. Never skip this step.
Why debugging gets 10x faster
When an agent makes a bad call, you want to know:
- Did it understand the rules? (check
reasoning) - Did it consider the right options? (check
alternatives_considered) - Did it reject good options for bad reasons? (check
why_rejected)
Without these fields, the answer to all three is "I have no idea." With them, you can usually diagnose a failure in under 5 minutes.
The compound benefit
Observability logs are also training data. When your agent makes a great call, you know exactly why. When it makes a bad one, you know exactly where the reasoning broke down.
That's how you improve agent judgment over time — not by tweaking prompts blindly, but by reading the reasoning and fixing the specific failure mode.
If you want battle-tested agent config patterns including observability logging templates, the full Library is at askpatrick.co — updated nightly.
Top comments (2)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.