<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Slim</title>
    <description>The latest articles on DEV Community by Slim (@slima4).</description>
    <link>https://dev.to/slima4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1192518%2F65a9f756-9f19-4409-8172-e6c6a681cfdc.png</url>
      <title>DEV Community: Slim</title>
      <link>https://dev.to/slima4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/slima4"/>
    <language>en</language>
    <item>
      <title>Sniffing Claude Code's API Calls: What Your IDE Is Really Sending</title>
      <dc:creator>Slim</dc:creator>
      <pubDate>Mon, 16 Mar 2026 05:05:11 +0000</pubDate>
      <link>https://dev.to/slima4/sniffing-claude-codes-api-calls-what-your-ide-is-really-sending-5fnl</link>
      <guid>https://dev.to/slima4/sniffing-claude-codes-api-calls-what-your-ide-is-really-sending-5fnl</guid>
      <description>&lt;p&gt;Every time you press Enter in Claude Code, something interesting happens behind the scenes. Your full conversation — system prompt, message history, tool definitions, everything — gets packaged into an API call and sent to Anthropic's servers.&lt;/p&gt;

&lt;p&gt;But you never get to see those calls. Claude Code logs a JSONL transcript of what it &lt;em&gt;did&lt;/em&gt; (tool calls, responses, thinking blocks), but not the raw API traffic that made it happen. The system prompt, HTTP headers, request parameters, latency per call, and one entirely hidden API call — all invisible.&lt;/p&gt;

&lt;p&gt;So we built a way to see everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trick: One Environment Variable
&lt;/h2&gt;

&lt;p&gt;Claude Code officially supports &lt;code&gt;ANTHROPIC_BASE_URL&lt;/code&gt; — an environment variable that redirects API traffic to a custom endpoint. It's meant for enterprise proxies, but it works perfectly for local interception:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Code  ──plain HTTP──▶  Sniffer (localhost:7735)  ──HTTPS──▶  api.anthropic.com
                                    │
                                    ▼
                          ~/.claude/api-sniffer/*.jsonl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the sniffer in one terminal, launch Claude Code in another:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1&lt;/span&gt;
claudetui sniffer

&lt;span class="c"&gt;# Terminal 2&lt;/span&gt;
claudetui sniff
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;claudetui sniff&lt;/code&gt; auto-detects the sniffer port and launches Claude Code through the proxy. If the sniffer isn't running, it falls back to launching Claude Code directly — so you never get stuck with a &lt;code&gt;ConnectionRefused&lt;/code&gt; retry loop.&lt;/p&gt;

&lt;p&gt;Every API call now flows through the proxy and gets logged. Claude Code works identically — it doesn't know (or care) that traffic is being captured.&lt;/p&gt;

&lt;p&gt;No TLS interception. No certificates. No patching binaries. Just a localhost HTTP server that forwards to the real API over HTTPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You See
&lt;/h2&gt;

&lt;p&gt;The sniffer prints one line per API call as it happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;  ClaudeTUI API Sniffer — listening on http://127.0.0.1:7735

  Use:  ANTHROPIC_BASE_URL=http://localhost:7735 claude
  Log:  ~/.claude/api-sniffer/sniffer-20260314-103000.jsonl

&lt;/span&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;1   POST /v1/messages  opus-4-6  45.2k-&amp;gt;1.5k  &lt;span class="nv"&gt;$0&lt;/span&gt;.120  2312ms  740KB/4.2KB  98%c  &lt;span class="o"&gt;[&lt;/span&gt;Tt]
&lt;span class="gp"&gt;  #&lt;/span&gt;2   POST /v1/messages  opus-4-6  48.1k-&amp;gt;0.8k  &lt;span class="nv"&gt;$0&lt;/span&gt;.094  1134ms  741KB/2.1KB  99%c  &lt;span class="o"&gt;[&lt;/span&gt;TU]  Edit
&lt;span class="gp"&gt;  #&lt;/span&gt;3   POST /v1/messages  opus-4-6  50.3k-&amp;gt;52    &lt;span class="nv"&gt;$0&lt;/span&gt;.081  1823ms  742KB/0.3KB  100%c  &lt;span class="o"&gt;[&lt;/span&gt;U]  Glob,Grep
&lt;span class="gp"&gt;  #&lt;/span&gt;4   POST /v1/messages  opus-4-6  12.3k-&amp;gt;2.1k  &lt;span class="nv"&gt;$0&lt;/span&gt;.041  3412ms  42KB/6.8KB   95%c  &lt;span class="o"&gt;[&lt;/span&gt;Tt]  compaction
&lt;span class="gp"&gt;  #&lt;/span&gt;5   POST /v1/messages  sonnet-4-6  14.3k-&amp;gt;2.1k  &lt;span class="nv"&gt;$0&lt;/span&gt;.008  2341ms  42KB/6.8KB  &lt;span class="o"&gt;[&lt;/span&gt;Tt]  +agent.1
&lt;span class="go"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;  Summary: 5 requests | $&lt;/span&gt;0.344 | 170k &lt;span class="k"&gt;in&lt;/span&gt; | 5.6k out | 2.3MB sent | 18KB recv | 1 sub-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each line shows the model, input-&amp;gt;output tokens, estimated cost, latency, traffic size, cache hit ratio, content block types, tool names, and sub-agent tracking. Compaction events get flagged automatically.&lt;/p&gt;

&lt;p&gt;The content blocks tell you what Claude is doing: &lt;code&gt;T&lt;/code&gt; = thinking, &lt;code&gt;t&lt;/code&gt; = text, &lt;code&gt;U&lt;/code&gt; = tool use, &lt;code&gt;S&lt;/code&gt; = server-side tool (like WebSearch). Cache ratio (&lt;code&gt;98%c&lt;/code&gt;) shows how much you're saving — a &lt;code&gt;0%c&lt;/code&gt; (shown in red) means a cache miss, 12.5x more expensive.&lt;/p&gt;

&lt;p&gt;Meanwhile, every request and response is logged as structured JSONL for later analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Transcripts Don't Tell You
&lt;/h2&gt;

&lt;p&gt;Claude Code's JSONL transcripts are useful, but they omit a lot. Here's what the sniffer captures that transcripts don't:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;th&gt;In Transcript?&lt;/th&gt;
&lt;th&gt;In Sniffer?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Token usage (input/output/cache)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw system prompt&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full conversation history per request&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request parameters (max_tokens, temperature)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP headers (anthropic-beta, version)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request/response latency&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hidden compaction API call&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error response bodies&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming SSE events&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool definitions (full JSON schema)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most interesting items on this list are the system prompt and the hidden compaction call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The System Prompt
&lt;/h2&gt;

&lt;p&gt;Claude Code's system prompt is sent on every single API call. It contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code's internal instructions and behavioral guidelines&lt;/li&gt;
&lt;li&gt;Tool definitions (Read, Write, Edit, Bash, Glob, Grep, etc.) with full JSON schemas&lt;/li&gt;
&lt;li&gt;Your CLAUDE.md project instructions&lt;/li&gt;
&lt;li&gt;Memory files, hooks output, and other injected context&lt;/li&gt;
&lt;li&gt;Safety and permission guidelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With &lt;code&gt;--full&lt;/code&gt; mode, the sniffer captures the complete system prompt text. In our sessions, it consistently measures &lt;strong&gt;~14k tokens&lt;/strong&gt; — a fixed tax on every API call.&lt;/p&gt;

&lt;p&gt;This is useful for understanding exactly what Claude Code "knows" about your project. Your CLAUDE.md, your hooks output, your memory files — it's all there in the system prompt, and now you can read it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden Compaction Call
&lt;/h2&gt;

&lt;p&gt;This is the one we were most curious about.&lt;/p&gt;

&lt;p&gt;When Claude Code's context window fills up (~167k of 200k tokens), it triggers compaction. The entire conversation gets compressed into a summary, and the next turn starts fresh with just the system prompt + summary.&lt;/p&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;the API call that generates the compaction summary doesn't appear in the transcript.&lt;/strong&gt; Claude Code makes it, receives the summary, and continues — but the JSONL transcript shows nothing. You see a &lt;code&gt;compact_boundary&lt;/code&gt; marker, but not the actual summarization call.&lt;/p&gt;

&lt;p&gt;The sniffer catches it because it's just another &lt;code&gt;POST /v1/messages&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;12  POST /v1/messages  opus-4-6  12.3k-&amp;gt;2.1k  &lt;span class="nv"&gt;$0&lt;/span&gt;.041  3412ms  compaction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sniffer detects compaction by comparing consecutive requests. When the message count drops by more than 50% or the total content size drops by more than 70% compared to the previous request, it flags it as post-compaction. The dramatic shrinkage — from 167k tokens of conversation down to a ~15k summary — is unmistakable.&lt;/p&gt;

&lt;p&gt;This is the only way to observe the compaction call's actual cost, latency, and output tokens. In our sessions, compaction summary generation takes 2-4 seconds and produces 11-19k tokens of compressed context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tool Use Loop
&lt;/h2&gt;

&lt;p&gt;When you see a line like this in the sniffer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;17  POST /v1/messages  opus-4-6  114.3k-&amp;gt;531  &lt;span class="nv"&gt;$0&lt;/span&gt;.217  16047ms  &lt;span class="o"&gt;[&lt;/span&gt;U]  tool
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's 114k tokens in but only 531 out. Why so few output tokens? Because Claude isn't writing prose — it's calling a tool. The response is just a small JSON block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"file_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/src/app.py"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the full cycle for a single tool call:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code sends the full conversation&lt;/strong&gt; to the API (114.3k input tokens — system prompt, message history, tool definitions, everything)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API responds with a &lt;code&gt;tool_use&lt;/code&gt; block&lt;/strong&gt; — just the tool name and parameters (531 output tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code executes the tool locally&lt;/strong&gt; — reads the file, runs the command, whatever the tool does&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code sends another request&lt;/strong&gt; with the tool result appended as a &lt;code&gt;tool_result&lt;/code&gt; message — now input tokens are higher because the file contents (or command output) are part of the conversation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's why you see rapid back-to-back requests in the sniffer. A single "read this file and edit it" from the user might generate 5+ API calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;17  POST /v1/messages  opus-4-6  114.3k-&amp;gt;531   &lt;span class="o"&gt;[&lt;/span&gt;U]  tool     ← decide to &lt;span class="nb"&gt;read &lt;/span&gt;file
&lt;span class="gp"&gt;  #&lt;/span&gt;18  POST /v1/messages  opus-4-6  116.1k-&amp;gt;204   &lt;span class="o"&gt;[&lt;/span&gt;U]  tool     ← decide to edit file
&lt;span class="gp"&gt;  #&lt;/span&gt;19  POST /v1/messages  opus-4-6  117.8k-&amp;gt;1.2k  &lt;span class="o"&gt;[&lt;/span&gt;Tt]          ← respond to user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each round-trip adds the tool result to the conversation, growing the input tokens. This is why context fills up faster than you'd expect — tool results (file contents, command output, search results) are often much larger than the tool call itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  SSE Streaming Under the Hood
&lt;/h2&gt;

&lt;p&gt;Claude Code uses Server-Sent Events (SSE) for streaming responses. The API returns &lt;code&gt;text/event-stream&lt;/code&gt; and sends data in chunks as the model generates tokens.&lt;/p&gt;

&lt;p&gt;The sniffer handles this transparently — it forwards each chunk to Claude Code as it arrives (so you don't notice any delay), while capturing the entire stream for logging.&lt;/p&gt;

&lt;p&gt;After streaming completes, it reassembles the SSE events to extract structured data: model, usage, stop reason, and content block types (text, thinking, tool_use). This is what makes the one-line terminal output possible — you get clean &lt;code&gt;45.2k-&amp;gt;1.5k $0.120 2312ms&lt;/code&gt; instead of raw SSE data.&lt;/p&gt;

&lt;p&gt;The key technical detail: we use &lt;code&gt;response.read1(8192)&lt;/code&gt; instead of &lt;code&gt;response.read(8192)&lt;/code&gt;. The &lt;code&gt;read1()&lt;/code&gt; method reads whatever data is currently available without waiting for the full buffer to fill — critical for streaming, where you need to forward partial data immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sub-Agent Tracking
&lt;/h2&gt;

&lt;p&gt;When Claude Code spawns a sub-agent (via the Agent tool), the sub-agent makes its own API calls — often using a different model. The sniffer tracks these by session ID:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;8   POST /v1/messages  opus-4-6    89.1k-&amp;gt;3.2k  &lt;span class="nv"&gt;$0&lt;/span&gt;.182  8234ms  99%c  &lt;span class="o"&gt;[&lt;/span&gt;TU]  Agent
&lt;span class="gp"&gt;  #&lt;/span&gt;9   POST /v1/messages  sonnet-4-6  14.3k-&amp;gt;2.1k  &lt;span class="nv"&gt;$0&lt;/span&gt;.008  2341ms         &lt;span class="o"&gt;[&lt;/span&gt;Tt]  +agent.1
&lt;span class="gp"&gt;  #&lt;/span&gt;10  POST /v1/messages  sonnet-4-6  16.5k-&amp;gt;1.2k  &lt;span class="nv"&gt;$0&lt;/span&gt;.006  1823ms         &lt;span class="o"&gt;[&lt;/span&gt;TU]  Read  agent.1
&lt;span class="gp"&gt;  #&lt;/span&gt;11  POST /v1/messages  sonnet-4-6  22.8k-&amp;gt;0.5k  &lt;span class="nv"&gt;$0&lt;/span&gt;.009  1243ms         &lt;span class="o"&gt;[&lt;/span&gt;t]   agent.1
&lt;span class="gp"&gt;  #&lt;/span&gt;12  POST /v1/messages  opus-4-6    92.3k-&amp;gt;1.5k  &lt;span class="nv"&gt;$0&lt;/span&gt;.152  4312ms  99%c  &lt;span class="o"&gt;[&lt;/span&gt;Tt]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;+agent.1&lt;/code&gt; marks the first request from a new sub-agent. Subsequent requests from the same agent show &lt;code&gt;agent.1&lt;/code&gt;. The main session has no label.&lt;/p&gt;

&lt;p&gt;This reveals things you can't see from the transcript: sub-agents often use Sonnet (cheaper, faster) for research tasks while the main session runs on Opus. You can see exactly how many API calls each sub-agent makes, their cost, and how they overlap with the main session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cache Misses — The Silent Cost Spike
&lt;/h2&gt;

&lt;p&gt;The cache ratio (&lt;code&gt;98%c&lt;/code&gt;, &lt;code&gt;100%c&lt;/code&gt;) shows what percentage of input tokens were cache reads. Most of the time it's near 100% — great, you're paying the cheap rate.&lt;/p&gt;

&lt;p&gt;But leave your session idle for ~5 minutes and watch what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;  #&lt;/span&gt;6   POST /v1/messages  opus-4-6  129.4k-&amp;gt;15   &lt;span class="nv"&gt;$0&lt;/span&gt;.199   3336ms  100%c  &lt;span class="o"&gt;[&lt;/span&gt;t]
&lt;span class="gp"&gt;  #&lt;/span&gt;7   POST /v1/messages  opus-4-6  129.5k-&amp;gt;428  &lt;span class="nv"&gt;$2&lt;/span&gt;.460  16108ms  0%c    &lt;span class="o"&gt;[&lt;/span&gt;Tt]
&lt;span class="gp"&gt;  #&lt;/span&gt;8   POST /v1/messages  opus-4-6  130.0k-&amp;gt;600  &lt;span class="nv"&gt;$0&lt;/span&gt;.248  18310ms  100%c  &lt;span class="o"&gt;[&lt;/span&gt;Tt]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Request #7 cost &lt;strong&gt;$2.46&lt;/strong&gt; — 12.5x more than usual — because the cache expired. All 129.5k tokens went through &lt;code&gt;cache_creation&lt;/code&gt; at $18.75/M instead of &lt;code&gt;cache_read&lt;/code&gt; at $1.50/M. Same data, same tokens, wildly different price.&lt;/p&gt;

&lt;p&gt;The sniffer shows &lt;code&gt;0%c&lt;/code&gt; in red to make these cache misses impossible to miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Per-Request Cost Tracking
&lt;/h2&gt;

&lt;p&gt;The sniffer calculates cost per API call using the token breakdown from the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_read_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;45000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_creation_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"output_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1500&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With model-specific pricing (Opus: $15/$1.50/$18.75/$75 per 1M tokens for input/cache-read/cache-write/output), each line shows the exact cost of that call. No estimation, no averaging — real cost per request.&lt;/p&gt;

&lt;p&gt;This revealed something we didn't expect: the variance between calls is huge. A simple response might cost $0.03, while a long code generation can cost $0.50+ — in the same session, same model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;p&gt;After running the sniffer on real sessions, a few things stood out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The system prompt is remarkably stable.&lt;/strong&gt; It barely changes between calls within a session. The ~14k tokens are almost entirely cached after the first call, making them cheap ($1.50/M vs $15/M for Opus). But they still consume context window space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Compaction is expensive in latency, not just tokens.&lt;/strong&gt; The summary generation call takes 2-4 seconds — during which Claude Code is unresponsive. On a long session with 3 compactions, that's 6-12 seconds of dead time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Cache hit rates are extraordinary.&lt;/strong&gt; In typical sessions, 95-98% of input tokens are cache reads. The stateless-API design sounds expensive, but caching makes it practical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Error responses are more informative than you'd think.&lt;/strong&gt; When Claude Code encounters a 429 (rate limit) or 529 (overloaded), the response body often includes retry-after headers and detailed error messages. These are swallowed by Claude Code's retry logic and never shown to you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Beta headers reveal feature flags.&lt;/strong&gt; The &lt;code&gt;anthropic-beta&lt;/code&gt; header shows which experimental features are active. Watching this change across Claude Code versions is interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Notes
&lt;/h2&gt;

&lt;p&gt;The sniffer is designed to be safe by default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Localhost only&lt;/strong&gt; — binds to &lt;code&gt;127.0.0.1&lt;/code&gt;, never &lt;code&gt;0.0.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API keys redacted&lt;/strong&gt; — &lt;code&gt;x-api-key&lt;/code&gt; and &lt;code&gt;authorization&lt;/code&gt; headers stripped from logs by default (use &lt;code&gt;--no-redact&lt;/code&gt; to override)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restricted permissions&lt;/strong&gt; — log files created with &lt;code&gt;0o600&lt;/code&gt; (owner read/write only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local plaintext&lt;/strong&gt; — the API key transits in plain text only over the loopback interface, which is standard for local proxy patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The sniffer is part of &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;ClaudeTUI&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
brew tap slima4/claude-tui &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; brew &lt;span class="nb"&gt;install &lt;/span&gt;claude-tui &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claudetui setup

&lt;span class="c"&gt;# Or&lt;/span&gt;
curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash

&lt;span class="c"&gt;# Run&lt;/span&gt;
claudetui sniffer              &lt;span class="c"&gt;# Terminal 1: start proxy&lt;/span&gt;
claudetui sniff                &lt;span class="c"&gt;# Terminal 2: launch claude through proxy&lt;/span&gt;
claudetui sniff &lt;span class="nt"&gt;--resume&lt;/span&gt; abc   &lt;span class="c"&gt;# or resume a session through proxy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Options:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--port PORT     Custom port (default: 7735)
--full          Log complete request/response bodies (warning: large files)
--no-redact     Include API keys in logs (use with caution)
--quiet         Suppress terminal output, log only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Python 3.8+, stdlib only — no external dependencies.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The sniffer captures data that was previously invisible. Combined with ClaudeTUI's existing &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;context efficiency analysis&lt;/a&gt;, this gives a complete picture of what Claude Code is doing under the hood — from high-level token waste tracking down to raw HTTP traffic.&lt;/p&gt;

&lt;p&gt;Some things we're exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replaying captured sessions&lt;/strong&gt; for cost modeling ("what would this session cost on Sonnet vs Opus?")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diffing system prompts&lt;/strong&gt; across Claude Code versions to track changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlating latency with context size&lt;/strong&gt; — does response time scale linearly with input tokens?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyzing compaction summaries&lt;/strong&gt; — what gets preserved and what gets lost?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're curious about what your Claude Code sessions actually look like at the API level, point the sniffer at a session and watch the data flow.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ClaudeTUI is open source and MIT licensed. Stdlib-only Python, zero external dependencies.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub: &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;github.com/slima4/claude-tui&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>sniffer</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Where Do Your Claude Code Tokens Actually Go? We Traced Every Single One</title>
      <dc:creator>Slim</dc:creator>
      <pubDate>Sat, 14 Mar 2026 06:25:43 +0000</pubDate>
      <link>https://dev.to/slima4/where-do-your-claude-code-tokens-actually-go-we-traced-every-single-one-423e</link>
      <guid>https://dev.to/slima4/where-do-your-claude-code-tokens-actually-go-we-traced-every-single-one-423e</guid>
      <description>&lt;p&gt;You're paying for 200,000 tokens of context. But how many of those tokens are actually doing useful work?&lt;/p&gt;

&lt;p&gt;We built &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;ClaudeTUI&lt;/a&gt; — a set of monitoring tools for Claude Code — and dug into the raw JSONL transcript data to trace every token. What we found surprised us: there are four distinct categories of token usage, and only one of them is your actual work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens When You Press Enter
&lt;/h2&gt;

&lt;p&gt;Here's something most Claude Code users don't realize: &lt;strong&gt;every time you press Enter, the entire conversation is sent from scratch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Claude API is stateless. It doesn't remember your previous messages. So every single keystroke triggers an API call that includes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt; (~14k tokens) — Claude Code's instructions, tool definitions, your CLAUDE.md&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full conversation history&lt;/strong&gt; — every message, every tool call, every tool result since the last compaction&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Your new message&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On turn 1, that's maybe 15k tokens. By turn 15, it's 100k. By turn 30, it's 167k — and then compaction fires.&lt;/p&gt;

&lt;p&gt;This is why Claude gets slower and more expensive as your session goes on. Each Enter keystroke processes more tokens than the last. And it's why compaction exists: without it, you'd hit the 200k wall and the session would simply stop.&lt;/p&gt;

&lt;p&gt;The good news: Anthropic's &lt;strong&gt;prompt caching&lt;/strong&gt; makes this less painful than it sounds. But it's worth understanding how.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cache Lives on Anthropic's Servers
&lt;/h3&gt;

&lt;p&gt;Your machine sends the full conversation on every request — the same bytes go over the network every time. The optimization happens server-side: Anthropic checks "have I seen this exact prefix of tokens recently?" If yes, it skips re-processing them and charges the cheaper &lt;strong&gt;cache read&lt;/strong&gt; rate ($1.50/M instead of $15/M for Opus — a 10x discount).&lt;/p&gt;

&lt;p&gt;In a 157-turn session, we measured &lt;strong&gt;98% of all tokens as cache reads&lt;/strong&gt;. That makes sense: by turn 100, you're re-sending 99 turns of history that are already cached. Only the newest content goes through the expensive &lt;code&gt;cache_creation&lt;/code&gt; path.&lt;/p&gt;

&lt;p&gt;The cache has a TTL — likely ~5 minutes for conversation content. If you pause too long between turns, the cache expires and the next call pays full input price for everything. This is also why compaction is expensive: it blows away the entire cached conversation and replaces it with a brand new summary that goes through &lt;code&gt;cache_creation&lt;/code&gt; from scratch.&lt;/p&gt;

&lt;p&gt;One more thing: &lt;strong&gt;the tokens still count toward your 200k context window&lt;/strong&gt;, even when cached. Caching saves money, not space.&lt;/p&gt;

&lt;p&gt;Now let's look at what those tokens actually are.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Types of Tokens
&lt;/h2&gt;

&lt;p&gt;Every API call Claude Code makes has a token usage breakdown in its transcript. By parsing thousands of these calls across real sessions, we identified four categories:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. System Prompt (~14k tokens) — The Constant Tax
&lt;/h3&gt;

&lt;p&gt;Every single API call includes a system prompt: Claude Code's internal instructions, tool definitions, safety guidelines, and your CLAUDE.md file. In our sessions, this was consistently &lt;strong&gt;~14,328 tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This isn't something you can avoid. It's infrastructure. But it means that out of your 200k window, only ~186k is ever available for actual conversation.&lt;/p&gt;

&lt;p&gt;We discovered this by looking at &lt;code&gt;cache_read_input_tokens&lt;/code&gt; after compaction events. The value &lt;strong&gt;resets to exactly 14,328&lt;/strong&gt; every time — that's the system prompt floor. During normal operation, &lt;code&gt;cache_read&lt;/code&gt; grows from 14k to 167k as your conversation accumulates in the cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Compaction Summary (~11-19k tokens) — The Rebuild Cost
&lt;/h3&gt;

&lt;p&gt;When compaction fires, Claude Code compresses your entire conversation into a summary. The next API call has to read that summary to reconstruct context. This is the &lt;strong&gt;real overhead of compaction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;From a real 3-compaction session:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Compaction&lt;/th&gt;
&lt;th&gt;Summary Size&lt;/th&gt;
&lt;th&gt;What It Costs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;#1&lt;/td&gt;
&lt;td&gt;18.8k tokens&lt;/td&gt;
&lt;td&gt;$0.47 (Opus)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#2&lt;/td&gt;
&lt;td&gt;10.6k tokens&lt;/td&gt;
&lt;td&gt;$0.22 (Opus)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;#3&lt;/td&gt;
&lt;td&gt;17.8k tokens&lt;/td&gt;
&lt;td&gt;$0.37 (Opus)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These summaries are lossy. Your 167k of rich context — exact error messages, file contents, code snippets — gets compressed into 11-19k tokens. Details are lost.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Useful Work — What You Actually Paid For
&lt;/h3&gt;

&lt;p&gt;This is everything else: your prompts, Claude's responses, tool calls, file reads, code edits, test output. The actual productive content of your session.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Headroom (~33k tokens) — The Unused Buffer
&lt;/h3&gt;

&lt;p&gt;Claude Code doesn't wait until 200k to compact. It triggers at roughly &lt;strong&gt;83% capacity (~167k tokens)&lt;/strong&gt;, reserving ~33k tokens as a buffer for the compaction process itself.&lt;/p&gt;

&lt;p&gt;That means ~16.5% of your context window is never available for useful work. You're paying for 200k but only getting ~167k.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real Session, Dissected
&lt;/h2&gt;

&lt;p&gt;Here's an actual 4-segment session from our monitoring data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Seg 1  ▒▒▓▓████████████████████████████████████████████████░░░░░  200.0k
         14.3k system │ 152.7k useful │ 33.0k headroom │ → compacted

  Seg 2  ▒▒▓▓▓████████████████████████████████████░░░░░░░░░░░░░░░  200.0k
         14.3k system │ 18.8k summary │ 114.4k useful │ 52.5k headroom │ → compacted

  Seg 3  ▒▒▓▓▓████████████████████████████████████████████████░░░  200.0k
         14.3k system │ 17.8k summary │ 141.2k useful │ 33.9k headroom │ → compacted

→ Seg 4  ▒▒▓▓██████                                                44.8k
         14.3k system │ 10.6k summary │ 12.7k useful

  Efficiency: 76%  │  Wasted: 166.5k/644.8k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;76% efficiency&lt;/strong&gt; means 76% of the total tokens went to useful work. The other 24% went to compaction summaries and headroom.&lt;/p&gt;

&lt;p&gt;Notice how Seg 1 has no summary — it's the first segment, nothing to rebuild from. But starting from Seg 2, every segment pays the summary tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hidden API Call
&lt;/h2&gt;

&lt;p&gt;One thing we couldn't find in the transcript: &lt;strong&gt;the compaction summary generation itself&lt;/strong&gt;. Claude Code makes a hidden API call that reads your ~167k context and produces the summary, but this call is not logged in the JSONL transcript.&lt;/p&gt;

&lt;p&gt;Based on the &lt;code&gt;preTokens&lt;/code&gt; metadata we found in compaction events, this hidden call reads the full pre-compaction context (~167k tokens). At Opus pricing ($1.50/M for cached reads), that's roughly $0.25 per compaction just for the summary generation — on top of the rebuild cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Wallet
&lt;/h2&gt;

&lt;p&gt;Let's do the math for a long Opus session with 3 compactions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token budget: 644.8k total&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Cost (Opus)&lt;/th&gt;
&lt;th&gt;% of Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Useful work&lt;/td&gt;
&lt;td&gt;490k&lt;/td&gt;
&lt;td&gt;~$8.50&lt;/td&gt;
&lt;td&gt;76%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compaction summaries&lt;/td&gt;
&lt;td&gt;47k&lt;/td&gt;
&lt;td&gt;~$0.85&lt;/td&gt;
&lt;td&gt;7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Headroom (unused)&lt;/td&gt;
&lt;td&gt;108k&lt;/td&gt;
&lt;td&gt;$0 (not billed)&lt;/td&gt;
&lt;td&gt;17%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompt (constant)&lt;/td&gt;
&lt;td&gt;~43k&lt;/td&gt;
&lt;td&gt;~$0.06 (cached)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hidden summary generation&lt;/td&gt;
&lt;td&gt;~500k reads&lt;/td&gt;
&lt;td&gt;~$0.75&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headroom tokens aren't billed directly — they represent capacity you couldn't use. But the summaries and hidden calls are real costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Sonnet 4.6&lt;/strong&gt; the same session would be dramatically cheaper. Sonnet supports up to 1M context, so with 644k tokens you'd hit &lt;strong&gt;zero compactions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All tokens are useful work&lt;/li&gt;
&lt;li&gt;Efficiency: 100%&lt;/li&gt;
&lt;li&gt;Cost: ~$5.50 (vs ~$10+ on Opus)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The System Prompt Discovery
&lt;/h2&gt;

&lt;p&gt;Perhaps the most interesting finding: the system prompt is a &lt;strong&gt;constant ~14k tax&lt;/strong&gt; on every segment.&lt;/p&gt;

&lt;p&gt;Before our investigation, we were counting the full post-compaction context as "rebuild waste." A segment showing &lt;code&gt;33.1k rebuild&lt;/code&gt; looked like 33.1k of compaction overhead. But 14.3k of that is system prompt — you'd pay it regardless.&lt;/p&gt;

&lt;p&gt;The actual compaction overhead (the summary) is only &lt;code&gt;33.1k - 14.3k = 18.8k&lt;/code&gt;. That's a 43% difference in how you measure waste.&lt;/p&gt;

&lt;p&gt;How we detected it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After compaction #1: cache_read = 14,328  ← system prompt
After compaction #2: cache_read = 14,328  ← same
After compaction #3: cache_read = 14,328  ← same

During normal operation: cache_read grows from 14k → 167k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cache_read&lt;/code&gt; value tells you exactly what's already cached. After compaction, only the system prompt survives in cache — everything else (the compaction summary) goes through &lt;code&gt;cache_creation&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compaction Cache Structure
&lt;/h2&gt;

&lt;p&gt;Here's how token caching works across a compaction boundary:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before compaction&lt;/strong&gt; (normal operation):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cache_read: 166,575    ← almost everything is cached
cache_creation: 312    ← tiny new content
input_tokens: 3        ← negligible
output_tokens: 126
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;First call after compaction&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cache_read: 14,328     ← only system prompt survives
cache_creation: 18,793 ← compaction summary, written fresh
input_tokens: 3
output_tokens: 1,249
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cache gets blown away by compaction. Everything that was cached (your conversation, tool results, file contents) is gone. Only the system prompt persists because it's on a separate, longer-lived cache (likely a 1-hour TTL vs the 5-minute conversation cache).&lt;/p&gt;

&lt;h2&gt;
  
  
  7 Things You Can Do Right Now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Use &lt;code&gt;/compact&lt;/code&gt; manually at logical breakpoints&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don't wait for auto-compaction at 167k. After finishing a feature or fixing a bug, compact yourself. You can guide what gets preserved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/compact Preserve all file paths, error messages, and the list of modified files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Use &lt;code&gt;/clear&lt;/code&gt; between distinct tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Switching from implementation to debugging? Starting a new feature? A fresh 186k of clean context beats 80k of stale context with irrelevant history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Delegate verbose work to subagents&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each subagent gets its own isolated 200k context window. Running tests, searching large codebases, or fetching documentation in subagents keeps verbose output from bloating your main session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Read files with line ranges&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of reading entire files, specify what you need: "Read lines 40-90 of handler.ts." Especially critical in debugging loops where you might read the same file repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Disable unused MCP servers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each MCP server loads its full tool schema into context on every request. A 20-tool server can consume 5-10k tokens just by existing. That's on top of the 14k system prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Keep CLAUDE.md under 200 lines&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;CLAUDE.md is part of that ~14k system prompt. It loads on every single API call and survives all compaction cycles. If it's bloated, you're paying on every call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Monitor your efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Install ClaudeTUI and watch the numbers in real-time. Seeing "Efficiency: 76%" drop to "Efficiency: 68%" after a compaction changes how you think about context management.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to See This Yourself
&lt;/h2&gt;

&lt;p&gt;Install ClaudeTUI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Via Homebrew&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;slima4/claude-tui/claude-tui &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; claudetui setup

&lt;span class="c"&gt;# Or directly&lt;/span&gt;
curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open a second terminal and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudetui monitor    &lt;span class="c"&gt;# live dashboard&lt;/span&gt;
claudetui chart      &lt;span class="c"&gt;# efficiency chart (standalone)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The efficiency chart shows the 4-component breakdown for every segment in your session — updated live as you work. Press &lt;code&gt;w&lt;/code&gt; in the monitor to open it, or &lt;code&gt;v&lt;/code&gt; to toggle between horizontal and vertical views.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Every Claude Code session has four types of token usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt; (~14k) — constant tax, can't avoid it, but it's cheap (cached)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compaction summaries&lt;/strong&gt; (~11-19k each) — the real cost of compaction, lossy compression of your work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Useful work&lt;/strong&gt; — what you actually paid for&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headroom&lt;/strong&gt; (~33k) — buffer that's never available for work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a typical 3-compaction Opus session, about &lt;strong&gt;76% of tokens are useful work&lt;/strong&gt;. The rest is overhead. Making this visible — and understanding what each component actually is — is the first step to spending tokens more intentionally.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;ClaudeTUI is open source and MIT licensed. Stdlib-only Python, zero external dependencies.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;GitHub: &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;github.com/slima4/claude-tui&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claude</category>
      <category>ai</category>
      <category>statusline</category>
      <category>productivity</category>
    </item>
    <item>
      <title>ClaudeTUI v0.3: Claude Code statusline gets a unified CLI, interactive configurator, and a proper splash screen</title>
      <dc:creator>Slim</dc:creator>
      <pubDate>Wed, 11 Mar 2026 07:27:08 +0000</pubDate>
      <link>https://dev.to/slima4/claudetui-v03-claude-code-statusline-gets-a-unified-cli-interactive-configurator-and-a-proper-4h7c</link>
      <guid>https://dev.to/slima4/claudetui-v03-claude-code-statusline-gets-a-unified-cli-interactive-configurator-and-a-proper-4h7c</guid>
      <description>&lt;p&gt;A couple of days ago I shared &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;ClaudeTUI&lt;/a&gt; — a real-time statusline, live monitor, and session analytics for Claude Code. Since then, a lot has changed. Here's what's new in v0.3.&lt;/p&gt;

&lt;h2&gt;
  
  
  One command to rule them all
&lt;/h2&gt;

&lt;p&gt;The biggest change: instead of six separate CLI commands for managing the Claude Code statusline, monitor, and analytics, there's now a single &lt;code&gt;claudetui&lt;/code&gt; dispatcher.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# before&lt;/span&gt;
claude-ui-monitor
claude-stats &lt;span class="nt"&gt;--days&lt;/span&gt; 7
claude-sessions list
claude-ui-mode compact

&lt;span class="c"&gt;# after&lt;/span&gt;
claudetui monitor
claudetui stats &lt;span class="nt"&gt;--days&lt;/span&gt; 7
claudetui sessions list
claudetui mode compact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every subcommand passes arguments straight through — &lt;code&gt;claudetui&lt;/code&gt; is just a 60-line Python script that resolves the right tool and &lt;code&gt;exec&lt;/code&gt;s it. No overhead, no framework. If you type &lt;code&gt;claudetui --help&lt;/code&gt;, you get the full menu:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claudetui — CLI for ClaudeTUI (Claude Code utilities)

Commands:
  monitor     Live session dashboard (separate terminal)
  stats       Post-session analytics
  sessions    Browse, compare, resume, export sessions
  mode        Switch statusline mode (full/compact/custom)
  setup       Configure statusline, hooks, and commands
  uninstall   Remove ClaudeTUI configuration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The old command names (&lt;code&gt;claude-ui-monitor&lt;/code&gt;, &lt;code&gt;claude-stats&lt;/code&gt;, etc.) are gone. Clean break, clean namespace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interactive statusline configurator
&lt;/h2&gt;

&lt;p&gt;The Claude Code statusline has 20+ components across three lines — context usage, cost, model, git stats, tool trace, and more. Previously you could only choose between "everything" (full mode) and "almost nothing" (compact mode). Now there's a third option:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudetui mode custom
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens a curses TUI where you can toggle individual components with arrow keys and spacebar:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14bnm0k416xcwcvt7ng0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14bnm0k416xcwcvt7ng0.png" alt="Statusline Configurator" width="800" height="863"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each component shows a &lt;strong&gt;live preview&lt;/strong&gt; of what it looks like — colored progress bars, sparklines, git stats — right in the menu. You can pick from five widget styles for the matrix rain area, or apply presets (&lt;code&gt;all&lt;/code&gt;, &lt;code&gt;minimal&lt;/code&gt;, &lt;code&gt;focused&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Don't like interactive menus? Use flags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudetui mode custom &lt;span class="nt"&gt;--hide&lt;/span&gt; model,cost,session_id
claudetui mode custom &lt;span class="nt"&gt;-w&lt;/span&gt; hex &lt;span class="nt"&gt;-p&lt;/span&gt; focused
claudetui mode custom &lt;span class="nt"&gt;-l&lt;/span&gt;   &lt;span class="c"&gt;# show what's currently hidden&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Config saves to &lt;code&gt;~/.claude/claudeui.json&lt;/code&gt; and hot-reloads — no restart needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitor got a facelift
&lt;/h2&gt;

&lt;p&gt;The live monitor (&lt;code&gt;claudetui monitor&lt;/code&gt;) picked up several improvements:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Splash screen&lt;/strong&gt; — the monitor now shows an ASCII art logo while it loads the session in the background. Looks cool, masks the 1-2 second discovery time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pinned layout&lt;/strong&gt; — the header (context, cost, stats) and footer (hotkey bar) are now fixed. Only the log section scrolls. No more hunting for the help bar when the log fills up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configurable log size&lt;/strong&gt; — add &lt;code&gt;"monitor": {"log_lines": 12}&lt;/code&gt; to your &lt;code&gt;claudeui.json&lt;/code&gt;, or set it to &lt;code&gt;0&lt;/code&gt; to hide the log entirely. Default is 8. Adjustable from the settings panel too (press &lt;code&gt;c&lt;/code&gt; in the monitor).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Responsive footer&lt;/strong&gt; — the hotkey bar adapts to terminal width. Full labels at 60+ columns, abbreviated at 40+, minimal below that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rebrand
&lt;/h2&gt;

&lt;p&gt;You might have noticed: it's ClaudeTUI now, not ClaudeUI. The old name was too generic and clashed with other projects. Everything got renamed — repo, Homebrew tap, slash commands (&lt;code&gt;/ui:*&lt;/code&gt; → &lt;code&gt;/tui:*&lt;/code&gt;), CLI tools, docs, landing page, and all the screenshots.&lt;/p&gt;

&lt;p&gt;The installer handles migration automatically — if you have the old &lt;code&gt;claudeui&lt;/code&gt; Homebrew tap, it untaps it and points you to the new one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew tap slima4/claude-tui
brew &lt;span class="nb"&gt;install &lt;/span&gt;claude-tui
claudetui setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or the one-liner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The transcript parsing approach works surprisingly well, but it's still reverse-engineering an undocumented format. Every Claude Code update is a small gamble. I'd love to see an official API for session metadata — even just a stable JSON schema for the transcript would help.&lt;/p&gt;

&lt;p&gt;In the meantime, if you use Claude Code and want a statusline with real-time context tracking, cost monitoring, and session analytics: &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;github.com/slima4/claude-tui&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Stars and issues welcome.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>python</category>
      <category>cli</category>
    </item>
    <item>
      <title>I built a real-time dashboard for Claude Code because I kept losing track of my sessions</title>
      <dc:creator>Slim</dc:creator>
      <pubDate>Mon, 09 Mar 2026 08:04:57 +0000</pubDate>
      <link>https://dev.to/slima4/i-built-a-real-time-dashboard-for-claude-code-because-i-kept-losing-track-of-my-sessions-2m54</link>
      <guid>https://dev.to/slima4/i-built-a-real-time-dashboard-for-claude-code-because-i-kept-losing-track-of-my-sessions-2m54</guid>
      <description>&lt;p&gt;Claude Code has a 200k token context window but gives you zero visibility into how much of it you've used — until auto-compaction kicks in and wipes half your context. I got tired of that surprise, so I built ClaudeTUI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;If you use Claude Code daily, you've probably hit these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-compaction fires mid-task and you lose context&lt;/li&gt;
&lt;li&gt;No idea how much a session is costing you&lt;/li&gt;
&lt;li&gt;Can't tell which files Claude has been touching&lt;/li&gt;
&lt;li&gt;No way to compare sessions or track patterns over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code is a powerful tool, but it's a black box. You type, it works, and you hope for the best.&lt;/p&gt;

&lt;h2&gt;
  
  
  What ClaudeTUI does
&lt;/h2&gt;

&lt;p&gt;It's a collection of tools that plug into Claude Code and give you full visibility:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Statusline&lt;/strong&gt; — a real-time status bar that sits at the bottom of Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; 0110100 Opus 4.6 │ ████████░░░░░░░░░░░░ 42% 65.5k/200.0k │ ~24 turns left │ $2.34 │ 12m │ 0x compact
 1001011 my-project │ main +42 -17 │ 18 turns │ 5 files │ 0 err │ 82% cache │ 4x think │ ~$0.13/turn
 0110010 read config.ts → edit config.ts → bash npm test → edit README.md │ config.ts×2 README.md×1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context usage, cost, cache ratio, git diff, tool trace, compaction prediction — all live. There's also a compact 1-line mode if you prefer minimal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Monitor&lt;/strong&gt; — open a second terminal and get a full dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ claude-ui-monitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mp0v25w11e0dog6oozt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6mp0v25w11e0dog6oozt.png" alt="ClaudeTUI Monitor — live session dashboard" width="800" height="697"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Context sparkline with compaction history, cost breakdown with cache savings, per-turn activity (tools, files, errors), session-wide stats, and a scrollable log viewer with filters. It even tracks agent spawns and their results. The matrix rain header pauses when Claude is idle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hooks&lt;/strong&gt; — automatic context injected into your sessions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session start: shows which files you've been editing recently across sessions&lt;/li&gt;
&lt;li&gt;After edit: warns you about reverse dependencies ("4 files import this module")&lt;/li&gt;
&lt;li&gt;Before edit: flags high-churn files ("config.ts edited 43 times in 5 sessions — maybe refactor?")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session Stats&lt;/strong&gt; — post-session analytics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ claude-stats --days 7 -s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost breakdown, token sparklines, tool usage charts, file activity heatmaps. See which sessions burned the most tokens and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session Manager&lt;/strong&gt; — browse and compare sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ claude-sessions list
$ claude-sessions diff abc123 def456
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Side-by-side comparison of cost, duration, tools used, and file activity between any two sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slash Commands&lt;/strong&gt; — deep reports without leaving Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/ui:session    # full session report
/ui:cost       # cost deep dive
/ui:perf       # tool efficiency analysis
/ui:context    # context window predictions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Everything runs by parsing Claude Code's transcript JSONL files from &lt;code&gt;~/.claude/projects/&lt;/code&gt;. No API keys, no external services, no dependencies — just Python 3.8+ and the standard library.&lt;/p&gt;

&lt;p&gt;The statusline uses Claude Code's &lt;code&gt;statusLine&lt;/code&gt; config to run a Python script that reads session metadata from stdin and the transcript file from disk. It does two passes: a reverse pass to find current context size, and a forward pass to accumulate costs and activity.&lt;/p&gt;

&lt;p&gt;The hooks use Claude Code's hooks system to run scripts on events like &lt;code&gt;SessionStart&lt;/code&gt;, &lt;code&gt;PreToolUse&lt;/code&gt;, and &lt;code&gt;PostToolUse&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The monitor watches the transcript file for changes and redraws when it detects new content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;

&lt;p&gt;One command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sSL&lt;/span&gt; https://raw.githubusercontent.com/slima4/claude-tui/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It sets up everything — statusline, hooks, slash commands, and CLI tools. The installer asks you to pick full or compact mode for the statusline. You can switch anytime with &lt;code&gt;claude-ui-mode compact&lt;/code&gt; or &lt;code&gt;claude-ui-mode full&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To uninstall: &lt;code&gt;claude-ui-uninstall&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;Building tools that parse another tool's internal format is fragile by nature. Claude Code's transcript format isn't documented, so I had to reverse-engineer it by reading JSONL files and figuring out the structure. It works well today, but could break with any Claude Code update.&lt;/p&gt;

&lt;p&gt;The other challenge was performance. The statusline runs on every refresh, so it needs to parse the transcript fast. For long sessions with thousands of entries, the reverse-pass-first approach helps — you find the current context size quickly without reading the entire file sequentially.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a href="https://slima4.github.io/claude-tui/" rel="noopener noreferrer"&gt;slima4.github.io/claude-tui&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/slima4/claude-tui" rel="noopener noreferrer"&gt;github.com/slima4/claude-tui&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you use Claude Code and want more visibility into your sessions, give it a try. Issues and PRs welcome.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
