Edward Kubiak

Posted on Apr 2

You're spending money on Claude Code and have no idea how much

#ai #claudecode #productivity #devtools

I've been running Claude Code heavily for a few weeks — multi-agent orchestration, parallel worktrees, plan execution across 5-10 batches per session. It's genuinely great for this. But I had no idea what it was actually costing me until I dug into the hook system.

The problem is that Claude Code doesn't surface cost data to the user in any structured way. There's a token counter somewhere in the UI, but it resets per session, doesn't break down by agent, and isn't queryable. If you're running an orchestrator that dispatches 10 subagents in parallel, you want to know which one is burning the most tokens — not just the session total.

So I built cast-observe: a lightweight hook-based observability layer that writes session cost, token counts, and agent activity to a local SQLite database, with a small CLI to query it.

brew tap ek33450505/cast-observe
brew install cast-observe
cast-observe install

The hook architecture

Claude Code exposes lifecycle hooks via settings.json. The one that matters for cost tracking is PostToolUse — it fires after every tool call and receives a JSON payload on stdin.

For Agent tool calls specifically, that payload looks like this:

{
  "session_id": "abc123",
  "hook_event_name": "PostToolUse",
  "tool_name": "Agent",
  "tool_response": {
    "type": "tool_result",
    "total_cost_usd": 0.047,
    "usage": {
      "input_tokens": 3200,
      "output_tokens": 8400
    }
  }
}

tool_response.total_cost_usd is the cost of the entire subagent run. tool_response.usage has the token breakdown. This is all you need to build a cost tracker.

The catch — and this cost me a few hours — is that CLAUDE_SESSION_ID, CLAUDE_INPUT_TOKENS, and CLAUDE_OUTPUT_TOKENS are not injected into hook process environments. I assumed they would be (the docs are sparse on this). They're not. Everything comes through stdin. Once I figured that out, the hook script was straightforward:

data = json.loads(stdin_json)
tool_resp = data.get('tool_response', {})
session_id = data['session_id']
input_tokens = tool_resp.get('usage', {}).get('input_tokens', 0)
output_tokens = tool_resp.get('usage', {}).get('output_tokens', 0)
direct_cost = tool_resp.get('total_cost_usd')

For regular tool calls (Bash, Read, Write), there's no usage data — those don't cost tokens directly. The hook just exits early.

The DB schema

cast-observe uses four tables:

sessions     -- one row per Claude Code session
agent_runs   -- one row per subagent dispatch
budgets      -- user-defined daily/weekly limits
hook_health  -- last-fired timestamp per hook

The sessions table accumulates token and cost totals via upsert:

INSERT INTO sessions (id, started_at, total_input_tokens, total_output_tokens, total_cost_usd)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(id) DO UPDATE SET
  total_input_tokens  = total_input_tokens  + excluded.total_input_tokens,
  total_output_tokens = total_output_tokens + excluded.total_output_tokens,
  total_cost_usd      = total_cost_usd      + excluded.total_cost_usd;

Every Agent PostToolUse fires the hook, which appends a new agent_runs row and increments the parent session totals. By end of session you have per-agent cost breakdowns and a session-level aggregate — without any polling or daemon.

What you can actually see

$ cast-observe budget --week

cast-observe — Budget Summary
════════════════════════════════════
  Today (2026-04-02):
    Input tokens:   14,203
    Output tokens:  89,441
    Cost:           $1.34

  This week:
    Input tokens:   41,996
    Output tokens:  177,689
    Cost:           $4.04

  Top agents by cost (all time):
    orchestrator       74 runs   $6.86
    general-purpose    77 runs   $5.00
    Explore            45 runs   $3.67
    researcher         50 runs   $3.47
    code-writer        38 runs   $2.91

The "top agents by cost" view is the one I actually use. When orchestrator is at the top, I know a plan was heavy on parallel agent dispatches. When researcher is high, I've been doing a lot of open-ended investigation. It gives you a feedback loop: is the way I'm structuring work actually efficient?

The non-obvious lessons

You can't trust env vars in hooks. The Claude Code hook environment is essentially {...process.env} with a couple of additions (CLAUDE_PROJECT_DIR). The session ID, model name, token counts — none of that is injected. Read from stdin.

agent_type is what SubagentStop sends, not agent_name. The SubagentStop hook sends agent_type for the subagent identifier. I had this wrong for a while and was logging everything as unknown. If you're building on top of the hook system, data['agent_type'] is the field you want.

total_cost_usd is more accurate than computing from token counts. When I first wrote the tracker, I was computing cost from (input_tokens / 1e6 * price_per_m) using a local pricing file. That's fine as a fallback, but if tool_response.total_cost_usd is present, use it directly — it reflects actual billing including cache read/write costs that a simple per-token calc misses.

Async hooks or your latency suffers. The cost tracker runs on every PostToolUse. If it's synchronous, every tool call waits for the SQLite write to complete. Mark telemetry hooks async: true in settings.json. The hook still fires; it just doesn't block the tool call from completing.

{
  "type": "command",
  "command": "bash ~/.claude/scripts/observe-cost-tracker.sh",
  "timeout": 10,
  "async": true
}

Installation

cast-observe ships as a Homebrew formula and a standalone installer:

# Homebrew
brew tap ek33450505/cast-observe
brew install cast-observe
cast-observe install

# Manual
git clone https://github.com/ek33450505/cast-observe
cd cast-observe && bash install.sh

cast-observe install wires the hooks into ~/.claude/settings.json (non-destructively — it merges, doesn't replace) and initializes the SQLite schema.

The repo has a 29-test BATS suite, CI on ubuntu and macos, issue templates, and a CONTRIBUTING guide if you want to add subcommands or hook integrations.

If you're running Claude Code for anything beyond one-off questions — especially if you're using the Agent tool to dispatch subagents — you're probably spending more than you think. cast-observe makes that visible.

Ed is a full-stack engineer in Ohio ed-tech and the author of CAST and cast-observe.

Top comments (3)

Benjamin Eckstein • Apr 3

Hey Edward,

The hook architecture section is the most useful part here — specifically that CLAUDE_SESSION_ID and token counts aren't injected into the hook environment, only stdin. That's the kind of thing that doesn't exist in any doc and costs you hours to discover.

One thing I'd push on: the "top agents by cost" view is where it gets genuinely interesting, but right now it's still passive observation. You see orchestrator is burning tokens — but the loop doesn't close automatically. The next step is using that data to actually change how you structure work, not just monitor it.

That's a distinction I've been sitting with: agents that record vs. the layer that actually thinks with the data. cast-observe nails the recording half. The value fully unlocks when something acts on the patterns — "this subagent type consistently burns 3x, restructure it."

I wrote about this separation in the context of operational logs and agent evolution: codewithagents.de/blog/agents-reco...

The infrastructure you've built here is exactly the signal source that makes that second layer possible.

Admin Chainmail • Apr 5

This hit close to home. I run Claude Code as an autonomous agent that operates a small product -- it handles marketing, checks metrics, responds to support emails, posts on dev.to (hi), and logs everything. Each session burns through context fast.

The hook-based tracking approach you describe is clever. We went a different route: the agent itself tracks its own compute by being told to be efficient, focus on highest-ROI actions, use Haiku subagents for research. It is more vibes-based cost management than actual measurement, which your post is making me rethink.

The real hidden cost is not tokens though -- it is the context window waste. When the agent reads a 200-line log file to orient itself each session, that is context budget gone before any real work starts. Would love to see someone build a cost-per-useful-output metric rather than just cost-per-token.

Mykola Kondratiuk • Apr 5

Ran into this. Built a token counter in CI just to see what autocomplete was burning - nearly half the budget was from tab-complete on tests. The visibility gap is real.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.