<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: vk0dev</title>
    <description>The latest articles on DEV Community by vk0dev (@vk0dev).</description>
    <link>https://dev.to/vk0dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3900770%2F4834d8ef-d366-4b8e-bcf6-89a2ca1e5d6f.png</url>
      <title>DEV Community: vk0dev</title>
      <link>https://dev.to/vk0dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vk0dev"/>
    <language>en</language>
    <item>
      <title>I read 30 of my Claude Code session logs and built a small tool</title>
      <dc:creator>vk0dev</dc:creator>
      <pubDate>Mon, 27 Apr 2026 15:37:55 +0000</pubDate>
      <link>https://dev.to/vk0dev/i-read-30-of-my-claude-code-session-logs-and-built-a-small-tool-2kf1</link>
      <guid>https://dev.to/vk0dev/i-read-30-of-my-claude-code-session-logs-and-built-a-small-tool-2kf1</guid>
      <description>&lt;p&gt;Three weeks ago I had an overnight agent run that cleaned out my Claude Max quota in about 40 minutes. I went to bed thinking the refactor would be done by morning, and woke up to a "rate limit reached" alert before any of it had committed.&lt;/p&gt;

&lt;p&gt;There's an active GitHub thread on this — anthropics/claude-code#41930. It got locked a couple of days ago with the lock reason "resolved", which I think means Anthropic shipped something server-side. Good. But a separate thing was bugging me, which is that I had no idea, while it was happening, what specifically was eating the quota. The Anthropic console tells you you're over the limit. It doesn't tell you why.&lt;/p&gt;

&lt;p&gt;So I pulled the JSONL session logs from &lt;code&gt;~/.claude/projects/&lt;/code&gt; and started reading. They're surprisingly readable: every assistant turn, every tool call, the token counts, all there. After a few hours I had three observations I haven't seen written up anywhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Cache-read tokens are not a flat multiplier
&lt;/h3&gt;

&lt;p&gt;I had assumed cache reads behaved like a steady tax on input tokens. They don't. They spike on certain tool sequences — specifically, when the agent does &lt;code&gt;Edit → Read(same file) → Edit → Read&lt;/code&gt;. Each &lt;code&gt;Read&lt;/code&gt; after the &lt;code&gt;Edit&lt;/code&gt; reloads the whole file context, and the cache-read count for those turns is something like 3-5× the count for turns that just used &lt;code&gt;Grep&lt;/code&gt; or &lt;code&gt;Bash&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Anyone who's had a long agent run that "felt fine per turn but blew the budget at the end" — this is probably why. The agent verifies its own work after every change, and each verification is full-context-priced.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Post-compact turns cost a lot more than you'd think
&lt;/h3&gt;

&lt;p&gt;When the conversation hits the context limit, Claude Code compacts: summarizes the prior context into a single message, then keeps going. The first 2-3 turns after a compact are around 3-5× the cost of the same kind of turn at session start. The agent has to re-establish its mental model, often re-reads files it already had loaded, and pays for it.&lt;/p&gt;

&lt;p&gt;This is a known trade-off (compaction lets you go past the limit at all), but I think people are seriously under-counting how much of the "I burned my quota" delta comes from this. If your workflow involves many long compacted sessions, that's where a chunk of the spend goes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Subagent loops multiply down the tree
&lt;/h3&gt;

&lt;p&gt;If your main agent dispatches a subagent, both processes log their own session JSONL. Both load the conversation context. Both pay for it. If the subagent dispatches further subagents, that compounds.&lt;/p&gt;

&lt;p&gt;The worst session I have is a 4-deep subagent chain where the leaf got into a Read-loop on the same file. It cost $47 for what should have been under a dollar. The data is sitting right there in the JSONL — there's a &lt;code&gt;subagentPaths&lt;/code&gt; field that ties it all together — but nothing I'd seen surfaces it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;A small MCP server, &lt;code&gt;@vk0/agent-cost-mcp&lt;/code&gt;. v2.0-beta.4 just shipped on npm. It reads the JSONL logs locally and exposes a few tools the agent can call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;get_subagent_tree&lt;/code&gt; — returns the subagent tree as a JSON object with cost summed per branch. This is the one I cared most about. You can see exactly which child agent burned what.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_tool_roi&lt;/code&gt; — for each tool the agent used, looks at whether the next turn moved the conversation forward (linked tool result, no immediate retry/rollback). Tools that fire repeatedly without progress get tagged &lt;code&gt;efficiency=low&lt;/code&gt;. That's the runaway-loop signature.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;detect_cost_anomalies&lt;/code&gt; — keeps a rolling daily baseline and flags days that deviate by more than ~50% from the recent average.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;estimate_run_cost&lt;/code&gt; — give it a prompt and a model, get back a &lt;code&gt;{low, expected, high}&lt;/code&gt; for what the run will cost.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;configure_budget&lt;/code&gt; — set a daily cap and three thresholds (80/100/150%). When you cross one, the next cost-query tool returns the alert in its response, and your agent can read that and stop.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;set_monitor_webhook&lt;/code&gt; — pipe alerts to Telegram/Slack/whatever. HMAC-signed so the receiver can verify.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's all local. No API key, no auth, no cloud. Reads files you already have. There's an opt-in telemetry stub for v2.1 so I can do anonymized benchmarks ("your session cost X, median for similar sessions is Y") but it's a no-op in v2.0 by design — I want to ship that with the privacy commitments worked out before the network code goes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't do
&lt;/h2&gt;

&lt;p&gt;It doesn't fix anything Anthropic-side. If their server-side resolution holds, the drain pattern from #41930 should be less common — but the per-tool waste, subagent loops, and post-compact reprocessing cost are independent problems that are still on you to manage.&lt;/p&gt;

&lt;p&gt;The forecasting tool is dumb right now. It's basically linear extrapolation. I'll do something with seasonal smoothing in v2.1 if it turns out to matter.&lt;/p&gt;

&lt;p&gt;The runaway detection has about an 8% false positive rate on my fixture sessions. I'd like to get that under 3% before going to GA, but I'd need adversarial samples I don't have. If anyone reading this has a JSONL file from a session that went really sideways (and is comfortable with anonymizing it), I'd love to run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx &lt;span class="nt"&gt;-y&lt;/span&gt; @vk0/agent-cost-mcp@beta
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="https://github.com/vk0dev/agent-cost-mcp" rel="noopener noreferrer"&gt;https://github.com/vk0dev/agent-cost-mcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Five MCP clients tested — Claude Desktop, Claude Code, Cursor, Cline, Windsurf.&lt;/p&gt;

</description>
      <category>claude</category>
      <category>mcp</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
