Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI
This is a cross-post. Canonical version: oss.vstorm.co/blog/ai-agent-tui-observability-pydantic-deep/
This week I covered how pydantic-deepagents handles stuck loops, context window blindness, and frictionless installation. Today: what you actually see when all of that runs.
The problem with invisible agents
When an AI agent runs, a lot happens between your prompt and the response. The model reasons. It calls tools. Each action burns tokens and costs money. Without observability, you're flying blind — you can't debug, optimize, or trust what's happening.
pydantic-deepagents v0.3.5 — the modular agent runtime for Python — reworks the TUI to surface everything.
What changed
Per-turn token usage
Below every assistant response:
in:2.1K · out:412 · total:2.5K · reqs:3
in = input tokens, out = output tokens, total = turn total, reqs = API calls in this turn.
Cumulative cost in the header
pydantic-deepagents in:45K out:3K · $0.12
Updates after each response. You always know the running total.
Thinking streamed live → collapsed
Model reasoning appears as dimmed text while running. Collapses to a one-line summary when done. Watch the agent reason without drowning in it.
Side panel on startup
Opens automatically when terminal ≥100 chars wide. Shows subagents before any task:
Subagents:
• planner (idle)
• research (idle)
Status updates as agents are delegated work.
All tool calls visible
Todo tools (read_todos, write_todos, add_todo, update_todo_status, remove_todo) were previously hidden. Now surfaced. Every agent action is visible.
Session saved on crash
_save_session() is now in a finally block. Crash, exception, keyboard interrupt — messages.json is always written. No more lost sessions.
Subagent logs: 20K chars (was 2K)
tool_log.jsonl now stores full subagent output. Critical for /improve — the pipeline that extracts learnings from sessions (more on that tomorrow).
The full layout
┌─────────────────────────────────┬──────────────────┐
│ pydantic-deepagents in:45K out:3K · $0.12 │
├─────────────────────────────────┼──────────────────┤
│ [thinking... dimmed text] │ Subagents: │
│ [collapsed to summary] │ • planner (idle) │
│ │ • research (idle)│
│ Agent response here... │ │
│ in:2.1K · out:412 · $0.04 │ │
└─────────────────────────────────┴──────────────────┘
Try it
curl -fsSL https://oss.vstorm.co/install.sh | bash
GitHub: github.com/vstorm-co/pydantic-deep
Observability is how you debug, optimize, and trust your agent. A black box is a liability.
What's the first metric you check when debugging an agent run?
Top comments (1)
Full observability in AI agents is so underrated. Being able to see token usage, tool calls, and intermediate steps in real-time in a TUI makes debugging multi-agent workflows so much more tractable.