<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Horilla</title>
    <description>The latest articles on DEV Community by Horilla (@horilla_support_8e7ce9908).</description>
    <link>https://dev.to/horilla_support_8e7ce9908</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3603663%2Fb5696a6f-16c7-44ea-acdb-c3fd44f426c5.png</url>
      <title>DEV Community: Horilla</title>
      <link>https://dev.to/horilla_support_8e7ce9908</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/horilla_support_8e7ce9908"/>
    <language>en</language>
    <item>
      <title>How we cut our Claude Code token costs by 80% — and the open-source tool we built to do it (v1.1.4, 13 commands)</title>
      <dc:creator>Horilla</dc:creator>
      <pubDate>Mon, 13 Apr 2026 05:00:02 +0000</pubDate>
      <link>https://dev.to/horilla_support_8e7ce9908/how-we-cut-our-claude-code-token-costs-by-80-and-the-open-source-tool-we-built-to-do-it-v114-1ko2</link>
      <guid>https://dev.to/horilla_support_8e7ce9908/how-we-cut-our-claude-code-token-costs-by-80-and-the-open-source-tool-we-built-to-do-it-v114-1ko2</guid>
      <description>&lt;p&gt;Our Claude Code bill was three times what it should have been. For a 9-developer team, that difference was significant enough to make us actually debug it. What we found was embarrassingly simple — and almost certainly affecting your setup too.&lt;/p&gt;

&lt;h3&gt;
  
  
  The problem: CLAUDE.md loads on every request
&lt;/h3&gt;

&lt;p&gt;Claude Code reads your &lt;code&gt;CLAUDE.md&lt;/code&gt; file on every single request. That's by design — it's how the tool loads your project instructions, coding conventions, and context. But here's the part that sneaks up on you: the file is read in full, every time, regardless of what task you're working on.&lt;/p&gt;

&lt;p&gt;Our &lt;code&gt;CLAUDE.md&lt;/code&gt; had grown to about 10,000 tokens over six months. It contained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architecture documentation for 40+ Django apps&lt;/li&gt;
&lt;li&gt;Coding standards and patterns for two separate codebases&lt;/li&gt;
&lt;li&gt;API references and import paths&lt;/li&gt;
&lt;li&gt;Session notes and debugging tips we'd accumulated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every time a developer asked Claude to fix a typo in a README, the full 10,000-token file was injected into the context. At roughly 60 requests per developer per day, across 9 developers, that's 5.4 million tokens of CLAUDE.md context per month — before writing a single line of code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why prompt caching wasn't saving us
&lt;/h3&gt;

&lt;p&gt;Anthropic caches prompts above 1,024 tokens, and it works well — &lt;em&gt;when the prompt is identical between requests&lt;/em&gt;. One character of difference and you pay full price.&lt;/p&gt;

&lt;p&gt;Our CLAUDE.md had dynamic content: session notes with timestamps, environment-specific paths, and other content that changed between requests. Every request was a cache miss. Every request billed at full input token price.&lt;/p&gt;

&lt;p&gt;This is the second thing &lt;code&gt;claudectx analyze&lt;/code&gt; catches: dynamic content that breaks caching.&lt;/p&gt;

&lt;h3&gt;
  
  
  What we built: claudectx v1.1.4
&lt;/h3&gt;

&lt;p&gt;We built &lt;strong&gt;claudectx&lt;/strong&gt; — a CLI that audits and optimizes what Claude Code loads per session. It's now at v1.1.4 with 13 commands across four categories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx claudectx analyze
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this in your project directory right now. No install needed. It outputs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claudectx analyze — token breakdown
════════════════════════════════════════
CLAUDE.md             7,841 tokens  ████████████████████ 68.1%
Open files            2,840 tokens  ██████               20.6%
Conversation history  1,630 tokens  ███                  11.2%
MCP tool results         14 tokens  ░                     0.1%
────────────────────────────────────────
Total                12,325 tokens

Waste patterns detected (3):
  ⚠  CLAUDE.md: 7,841 tokens — 292% over the 2,000 token recommendation
  ⚠  No .claudeignore file found
  ⚠  CLAUDE.md contains dynamic timestamp — breaks prompt caching
  → Run `claudectx optimize --apply` to fix all 3 issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The core fix: optimize
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudectx optimize &lt;span class="nt"&gt;--apply&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This does three things automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Splits CLAUDE.md&lt;/strong&gt; into a lean core + demand-loaded &lt;code&gt;@file&lt;/code&gt; sections. Core stays inline (under 2K tokens); large reference sections load only when relevant files are open.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates .claudeignore&lt;/strong&gt; with Node.js, Python, and common binary patterns to stop loading lock files, build artifacts, and assets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strips cache-busting content&lt;/strong&gt; — removes dynamic timestamps and session notes that prevent Anthropic's prompt cache from activating.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Safety:&lt;/strong&gt; Every file claudectx touches is automatically backed up to &lt;code&gt;~/.claudectx/backups/&lt;/code&gt;. If anything looks wrong, &lt;code&gt;claudectx revert --list&lt;/code&gt; shows all backups and &lt;code&gt;claudectx revert --id &amp;lt;id&amp;gt;&lt;/code&gt; restores any of them.&lt;/p&gt;

&lt;p&gt;On our setup this cut tokens from 18,432 to 3,740 per request (79.7% reduction).&lt;/p&gt;

&lt;h3&gt;
  
  
  The MCP proxy: symbol-level reads
&lt;/h3&gt;

&lt;p&gt;The less obvious win is &lt;code&gt;claudectx mcp&lt;/code&gt; — a local MCP server proxy that intercepts file-read requests and returns symbol-level slices instead of whole files.&lt;/p&gt;

&lt;p&gt;When Claude reads a file to find a class definition, it gets the entire file. On a large Django app, that's often 12,000+ tokens for one model. The same information as a symbol-level read (just the class) is typically 800 tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudectx mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure Claude Code to use the local MCP server and you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;smart_read&lt;/code&gt;&lt;/strong&gt; — reads a symbol by name (class, function, method)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;search_symbols&lt;/code&gt;&lt;/strong&gt; — finds symbols across the codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;index_project&lt;/code&gt;&lt;/strong&gt; — builds a local symbol index&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Analytics commands
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx watch&lt;/code&gt;&lt;/strong&gt; — a live terminal dashboard (Ink/React) showing token burn, cache hit rate, and most-read files as you work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx compress&lt;/code&gt;&lt;/strong&gt; — distills your session JSONL into a MEMORY.md entry. An 8,000-token session typically compresses to 150–200 tokens. Next session starts lean without losing context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx report&lt;/code&gt;&lt;/strong&gt; — 7/30-day analytics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sessions:          23    Requests:       847
Input tokens:   2,341,200    Cache hits:    51%
Total cost (est.): $4.87   Avg/session:  $0.21
Top waste file:    CLAUDE.md  (12,400 tokens, 847 reads)
→ Run `claudectx drift` to clean up stale sections
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;`claudectx budget "&lt;/strong&gt;/&lt;em&gt;.py"`&lt;/em&gt;* — estimate token cost before running a task. Shows per-file token counts, cache hit likelihood, total cost, and .claudeignore recommendations. Like git status for your context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx drift&lt;/code&gt;&lt;/strong&gt; — scans CLAUDE.md for dead &lt;code&gt;@file&lt;/code&gt; references, git-deleted file mentions, and sections with zero reads in the last 30 days. Real cost: you're loading documentation for files that no longer exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  Teams and multi-assistant support
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx teams&lt;/code&gt;&lt;/strong&gt; — per-developer cost attribution for multi-dev teams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claudectx teams &lt;span class="nb"&gt;export&lt;/span&gt;          &lt;span class="c"&gt;# → ~/.claudectx/team-export-{date}.json&lt;/span&gt;
claudectx teams aggregate &lt;span class="nt"&gt;--dir&lt;/span&gt; ./reports/  &lt;span class="c"&gt;# merge all exports&lt;/span&gt;
claudectx teams aggregate &lt;span class="nt"&gt;--anonymize&lt;/span&gt;       &lt;span class="c"&gt;# Dev 1, Dev 2...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each developer exports an anonymized summary of their session data. The lead aggregates them without seeing session content. Know where the budget is going across the team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx convert --to cursor|copilot|windsurf&lt;/code&gt;&lt;/strong&gt; — exports your CLAUDE.md to other AI assistant formats. Splits sections into &lt;code&gt;.cursor/rules/*.mdc&lt;/code&gt; files for Cursor, or &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt; for Copilot. One source, every assistant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx warmup&lt;/code&gt;&lt;/strong&gt; — sends a priming request to Anthropic so your first working request gets a cache hit instead of a full miss. &lt;code&gt;--cron "0 9 * * 1-5"&lt;/code&gt; installs as a morning cron job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;claudectx hooks list|add|remove&lt;/code&gt;&lt;/strong&gt; — named hook marketplace. Four built-ins: auto-compress (triggers on file reads), daily-budget (budget check before tool use), slack-digest (session summary to Slack webhook), session-warmup (cache pre-warm on read events).&lt;/p&gt;

&lt;h3&gt;
  
  
  Real results
&lt;/h3&gt;

&lt;p&gt;After running &lt;code&gt;optimize --apply&lt;/code&gt; on our setup:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per request&lt;/td&gt;
&lt;td&gt;18,432&lt;/td&gt;
&lt;td&gt;3,740&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate&lt;/td&gt;
&lt;td&gt;12%&lt;/td&gt;
&lt;td&gt;74%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly cost estimate&lt;/td&gt;
&lt;td&gt;$87&lt;/td&gt;
&lt;td&gt;$17&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 80% figure is real but came from our specific setup — an unusually large CLAUDE.md and a completely unconfigured .claudeignore. If your config is already lean, expect 20–40%. The &lt;code&gt;analyze&lt;/code&gt; command will tell you in 30 seconds what your actual baseline is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# No install — try immediately&lt;/span&gt;
npx claudectx analyze

&lt;span class="c"&gt;# Install globally&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; claudectx
&lt;span class="c"&gt;# or via Homebrew:&lt;/span&gt;
brew tap Horilla/claudectx &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; brew &lt;span class="nb"&gt;install &lt;/span&gt;claudectx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Website: &lt;a href="https://claudectx.horilla.com" rel="noopener noreferrer"&gt;claudectx.horilla.com&lt;/a&gt;&lt;br&gt;
Source (MIT): &lt;a href="https://github.com/Horilla/claudectx" rel="noopener noreferrer"&gt;github.com/Horilla/claudectx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're the team behind &lt;a href="https://github.com/horilla-opensource/horilla" rel="noopener noreferrer"&gt;Horilla&lt;/a&gt; — an open-source Django HRMS with 40+ apps. This tool came from real pain running Claude Code across a multi-repo, multi-app codebase on a 9-developer team. If you're in a similar situation, we'd love your feedback.&lt;/p&gt;

&lt;p&gt;Issues and PRs welcome. If &lt;code&gt;analyze&lt;/code&gt; shows something surprising, share it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
