When your AI agent produces a wrong answer, where do you look?
Most people check the prompt. Maybe the tools. Maybe the model version.
But the actual culprit is usually invisible: you have no observability layer. You don't know which turn caused the drift. You don't know which tool call cost $0.40. You don't know whether the agent read the right file version.
You only know the output was wrong.
This is the observability gap, and it's where most AI agent projects die slowly.
What Observability Means for Agents
For traditional software, observability means logs, metrics, and traces. For AI agents, it means three things:
- What did the agent know at each turn? (context state)
- What did it decide to do? (action log)
- What did each decision cost? (token/API cost per action)
Without these three, you're flying blind. You can't improve what you can't measure.
The Minimal Observability Stack
You don't need a commercial APM tool. You need three files and discipline.
1. current-task.json — State Snapshot
Every agent turn, write current state before acting:
{
"task": "draft weekly newsletter",
"step": "gathering_sources",
"started": "2026-03-08T09:00:00Z",
"last_updated": "2026-03-08T09:04:12Z",
"sources_found": 3,
"target_sources": 5
}
Now you know exactly where the agent was when something went wrong.
2. action-log.jsonl — Decision Trace
Append one line per action:
{"ts":"2026-03-08T09:04:13Z","action":"web_search","query":"AI agent patterns 2026","result_count":8,"tokens":420,"cost_usd":0.003}
{"ts":"2026-03-08T09:04:28Z","action":"read_file","path":"memory/2026-03-07.md","tokens":1200,"cost_usd":0.008}
Now you can see the exact decision sequence. You can replay it. You can spot where cost exploded.
3. memory/YYYY-MM-DD.md — Session Log
A human-readable narrative of what happened each session. Not structured data — prose. Useful for pattern recognition across days.
The Debugging Workflow
When something goes wrong:
-
Read
current-task.json— What state was the agent in? -
Grep
action-log.jsonlfor the timestamp window — What actions did it take? -
Read
memory/YYYY-MM-DD.md— What did the agent think was happening?
Three reads. You now know more than most teams learn in hours of debugging.
Cost Observability: The Hidden Win
The side effect of action logging is cost transparency.
Once you see per-action costs, patterns emerge fast:
- That web search you thought was cheap? It's running 12 times per loop.
- The file read you added for safety? It's loading a 4,000-token document every turn when you need 40 tokens.
- That reasoning model you used for a simple categorization? $0.15 per call, 200 calls per day.
One team I know cut API costs from $180/month to $47/month after adding action logging. Not by changing the agent logic — just by seeing what it was actually doing.
The Principle: Write Before You Act
The simple rule that makes all of this work:
Write state before every action. Read state at the start of every turn.
Not after. Before. If the agent crashes mid-action, you still have a record of what it intended.
This single habit gives you:
- Crash recovery (resume from last known state)
- Drift detection (compare intended vs actual state over time)
- Cost attribution (tie costs to specific tasks)
- Auditability (prove what happened and why)
What This Looks Like in Practice
Here's a minimal agent loop with observability baked in:
import json, datetime
def agent_turn(task_state, action):
# 1. Write state BEFORE acting
task_state['last_updated'] = datetime.datetime.utcnow().isoformat()
task_state['current_action'] = action['name']
with open('current-task.json', 'w') as f:
json.dump(task_state, f)
# 2. Execute action
result = execute(action)
# 3. Log the action
log_entry = {
'ts': datetime.datetime.utcnow().isoformat(),
'action': action['name'],
'tokens': result.get('tokens_used', 0),
'cost_usd': result.get('cost', 0)
}
with open('action-log.jsonl', 'a') as f:
f.write(json.dumps(log_entry) + '
)
return result
Fifteen lines. Full observability.
The Audit You Should Run Today
If you're running agents in production without observability, do this:
- Add
current-task.jsonwrites to your agent loop (30 minutes) - Add JSONL action logging (1 hour)
- Run for 24 hours
- Read the log
I guarantee you'll find at least one thing that surprises you — an action running more than expected, a cost spike you didn't know about, or a pattern that explains a bug you've been chasing.
You can't improve what you can't see. Start seeing.
The full observability pattern — including file templates, log analysis scripts, and cost dashboards — is in the Ask Patrick Library at askpatrick.co. Updated weekly with new agent operation patterns.
Top comments (0)