Last month's Anthropic invoice: $312. Sixty percent of it traced back to a single retry pattern I couldn't see anywhere in my normal logs.
The agent was failing on tool calls, then re-entering the loop with the full context intact — 18K input tokens per invocation on a task that needs 3-4K. Claude Code's UI looked fine. Workers logs showed 200s. D1 writes were clean. The billing dashboard just said "tokens used" with no breakdown by worker or call chain.
I found the culprit only after shipping Workers logs to R2 via Logpush and querying with DuckDB:
SELECT
worker_name,
COUNT(*) as call_count,
AVG(input_tokens) as avg_input,
SUM(input_tokens) as total_input
FROM read_parquet('s3://my-logs/workers/2026-05/*.parquet')
GROUP BY worker_name
ORDER BY total_input DESC;
One worker — ad-report-summarizer — was eating 58% of total input tokens. That query cost me maybe 20 minutes to set up. The Logpush + R2 + DuckDB stack runs under $5/month.
Once I had a suspect, I used Claude Code's --verbose flag to reconstruct the tool call chain. Most people treat --verbose as a log-level toggle. It's not — it dumps the full tool input/output JSON for every call in the session. Pipe it to a file, run jq on it, and you can replay the exact sequence that blew up your context.
For multi-agent loops specifically (I run 6 Slack bots coordinated through Workers), KV counters have been the single most reliable safeguard. A counter keyed to the conversation thread, checked on every bot invocation, with a last_actor field — when the counter approaches the limit, last_actor tells you immediately which bot is driving the chain. Six months in, it's almost always summarizer-bot triggering router-bot triggering summarizer-bot again.
The harder unsolved problem: I'm still seeing intermittent schema drift in tool call responses — same prompt, same model, valid JSON but different structure. It's non-deterministic, doesn't reproduce on demand, and when it triggers a retry, costs double. I haven't confirmed whether it's a Sonnet serialization quirk or something in my Workers pipeline.
I wrote up the full breakdown — including the PostToolUse hook setup for snapshotting tool call sequences, the cf-ray correlation trick for tracing multi-worker chains, and the per-tool production evaluation table — over on riversealab.com.
Top comments (0)