I've been running Claude Code heavily for a few weeks — multi-agent orchestration, parallel worktrees, plan execution across 5-10 batches per session. It's genuinely great for this. But I had no idea what it was actually costing me until I dug into the hook system.
The problem is that Claude Code doesn't surface cost data to the user in any structured way. There's a token counter somewhere in the UI, but it resets per session, doesn't break down by agent, and isn't queryable. If you're running an orchestrator that dispatches 10 subagents in parallel, you want to know which one is burning the most tokens — not just the session total.
So I built cast-observe: a lightweight hook-based observability layer that writes session cost, token counts, and agent activity to a local SQLite database, with a small CLI to query it.
brew tap ek33450505/cast-observe
brew install cast-observe
cast-observe install
The hook architecture
Claude Code exposes lifecycle hooks via settings.json. The one that matters for cost tracking is PostToolUse — it fires after every tool call and receives a JSON payload on stdin.
For Agent tool calls specifically, that payload looks like this:
{
"session_id": "abc123",
"hook_event_name": "PostToolUse",
"tool_name": "Agent",
"tool_response": {
"type": "tool_result",
"total_cost_usd": 0.047,
"usage": {
"input_tokens": 3200,
"output_tokens": 8400
}
}
}
tool_response.total_cost_usd is the cost of the entire subagent run. tool_response.usage has the token breakdown. This is all you need to build a cost tracker.
The catch — and this cost me a few hours — is that CLAUDE_SESSION_ID, CLAUDE_INPUT_TOKENS, and CLAUDE_OUTPUT_TOKENS are not injected into hook process environments. I assumed they would be (the docs are sparse on this). They're not. Everything comes through stdin. Once I figured that out, the hook script was straightforward:
data = json.loads(stdin_json)
tool_resp = data.get('tool_response', {})
session_id = data['session_id']
input_tokens = tool_resp.get('usage', {}).get('input_tokens', 0)
output_tokens = tool_resp.get('usage', {}).get('output_tokens', 0)
direct_cost = tool_resp.get('total_cost_usd')
For regular tool calls (Bash, Read, Write), there's no usage data — those don't cost tokens directly. The hook just exits early.
The DB schema
cast-observe uses four tables:
sessions -- one row per Claude Code session
agent_runs -- one row per subagent dispatch
budgets -- user-defined daily/weekly limits
hook_health -- last-fired timestamp per hook
The sessions table accumulates token and cost totals via upsert:
INSERT INTO sessions (id, started_at, total_input_tokens, total_output_tokens, total_cost_usd)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(id) DO UPDATE SET
total_input_tokens = total_input_tokens + excluded.total_input_tokens,
total_output_tokens = total_output_tokens + excluded.total_output_tokens,
total_cost_usd = total_cost_usd + excluded.total_cost_usd;
Every Agent PostToolUse fires the hook, which appends a new agent_runs row and increments the parent session totals. By end of session you have per-agent cost breakdowns and a session-level aggregate — without any polling or daemon.
What you can actually see
$ cast-observe budget --week
cast-observe — Budget Summary
════════════════════════════════════
Today (2026-04-02):
Input tokens: 14,203
Output tokens: 89,441
Cost: $1.34
This week:
Input tokens: 41,996
Output tokens: 177,689
Cost: $4.04
Top agents by cost (all time):
orchestrator 74 runs $6.86
general-purpose 77 runs $5.00
Explore 45 runs $3.67
researcher 50 runs $3.47
code-writer 38 runs $2.91
The "top agents by cost" view is the one I actually use. When orchestrator is at the top, I know a plan was heavy on parallel agent dispatches. When researcher is high, I've been doing a lot of open-ended investigation. It gives you a feedback loop: is the way I'm structuring work actually efficient?
The non-obvious lessons
You can't trust env vars in hooks. The Claude Code hook environment is essentially {...process.env} with a couple of additions (CLAUDE_PROJECT_DIR). The session ID, model name, token counts — none of that is injected. Read from stdin.
agent_type is what SubagentStop sends, not agent_name. The SubagentStop hook sends agent_type for the subagent identifier. I had this wrong for a while and was logging everything as unknown. If you're building on top of the hook system, data['agent_type'] is the field you want.
total_cost_usd is more accurate than computing from token counts. When I first wrote the tracker, I was computing cost from (input_tokens / 1e6 * price_per_m) using a local pricing file. That's fine as a fallback, but if tool_response.total_cost_usd is present, use it directly — it reflects actual billing including cache read/write costs that a simple per-token calc misses.
Async hooks or your latency suffers. The cost tracker runs on every PostToolUse. If it's synchronous, every tool call waits for the SQLite write to complete. Mark telemetry hooks async: true in settings.json. The hook still fires; it just doesn't block the tool call from completing.
{
"type": "command",
"command": "bash ~/.claude/scripts/observe-cost-tracker.sh",
"timeout": 10,
"async": true
}
Installation
cast-observe ships as a Homebrew formula and a standalone installer:
# Homebrew
brew tap ek33450505/cast-observe
brew install cast-observe
cast-observe install
# Manual
git clone https://github.com/ek33450505/cast-observe
cd cast-observe && bash install.sh
cast-observe install wires the hooks into ~/.claude/settings.json (non-destructively — it merges, doesn't replace) and initializes the SQLite schema.
The repo has a 29-test BATS suite, CI on ubuntu and macos, issue templates, and a CONTRIBUTING guide if you want to add subcommands or hook integrations.
If you're running Claude Code for anything beyond one-off questions — especially if you're using the Agent tool to dispatch subagents — you're probably spending more than you think. cast-observe makes that visible.
Ed is a full-stack engineer in Ohio ed-tech and the author of CAST and cast-observe.
Top comments (0)