<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raj</title>
    <description>The latest articles on DEV Community by Raj (@rajkumarsakthi).</description>
    <link>https://dev.to/rajkumarsakthi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3918631%2F65d2c10f-868a-4556-bd7f-0851354f3c05.png</url>
      <title>DEV Community: Raj</title>
      <link>https://dev.to/rajkumarsakthi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rajkumarsakthi"/>
    <language>en</language>
    <item>
      <title>I Cut My Claude Code Token Usage by 94% With This Open Source Tool</title>
      <dc:creator>Raj</dc:creator>
      <pubDate>Thu, 07 May 2026 19:33:17 +0000</pubDate>
      <link>https://dev.to/rajkumarsakthi/i-cut-my-claude-code-token-usage-by-94-with-this-open-source-tool-h9l</link>
      <guid>https://dev.to/rajkumarsakthi/i-cut-my-claude-code-token-usage-by-94-with-this-open-source-tool-h9l</guid>
      <description>&lt;p&gt;If you use Claude Code, Cursor, or any AI coding tool, you're probably burning tokens on the same files over and over. Every session, the AI re-reads your codebase from scratch.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/elara-labs/code-context-engine" rel="noopener noreferrer"&gt;Code Context Engine (CCE)&lt;/a&gt; to fix this. It indexes your code locally and lets the AI search instead of reading entire files. The result: &lt;strong&gt;94% fewer input tokens&lt;/strong&gt;, benchmarked on FastAPI with 20 real coding queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Input tokens are 85-95% of your Claude Code bill. Every time you ask Claude about your payment flow, it reads &lt;code&gt;payments.py&lt;/code&gt;, &lt;code&gt;shipping.py&lt;/code&gt;, and whatever else it thinks might be relevant. That's 45,000 tokens for a question that needs 800 tokens of context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without CCE:    Claude reads payments.py + shipping.py   = 45,000 tokens
With CCE:       context_search "payment flow"            =    800 tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;CCE runs as a local MCP server. Three lines to set up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;code-context-engine
&lt;span class="nb"&gt;cd&lt;/span&gt; /path/to/your/project
cce init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No cloud. No config. &lt;code&gt;cce init&lt;/code&gt; auto-detects your editor (Claude Code, VS Code, Cursor, Gemini CLI, Codex, OpenCode) and writes the right config.&lt;/p&gt;

&lt;p&gt;Under the hood:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tree-sitter&lt;/strong&gt; parses your code into semantic chunks (functions, classes, modules)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval&lt;/strong&gt; combines vector similarity with BM25 keyword matching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph expansion&lt;/strong&gt; walks CALLS/IMPORTS edges to pull in related code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression&lt;/strong&gt; reduces chunks to signatures and docstrings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; persists decisions and code areas across sessions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Re-indexing after edits takes under 1 second (96% embedding cache hit rate). Git hooks keep the index current automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark
&lt;/h2&gt;

&lt;p&gt;We benchmarked against &lt;a href="https://github.com/fastapi/fastapi" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; (53 source files, 180K tokens) with 20 real coding questions. No cherry-picking.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retrieval savings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;94%&lt;/strong&gt; (83,681 → 4,927 tokens/query)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression (additional)&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recall@10&lt;/td&gt;
&lt;td&gt;0.90&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency p50&lt;/td&gt;
&lt;td&gt;0.4ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; The 94% is measured against full-file reads, not against Claude Code's built-in exploration. We use full-file as the baseline because it's reproducible and deterministic. &lt;a href="https://elara-labs.github.io/code-context-engine/blog/benchmark-fastapi.html" rel="noopener noreferrer"&gt;Full methodology here.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can reproduce it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;code-context-engine
python benchmarks/run_benchmark.py &lt;span class="nt"&gt;--repo&lt;/span&gt; https://github.com/fastapi/fastapi.git &lt;span class="nt"&gt;--source-dir&lt;/span&gt; fastapi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What You Get
&lt;/h2&gt;

&lt;p&gt;9 MCP tools that Claude uses automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;context_search&lt;/code&gt; for hybrid vector + BM25 search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session_recall&lt;/code&gt; and &lt;code&gt;record_decision&lt;/code&gt; for cross-session memory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;related_context&lt;/code&gt; for code graph traversal&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;set_output_compression&lt;/code&gt; for controlling response verbosity&lt;/li&gt;
&lt;li&gt;Plus &lt;code&gt;expand_chunk&lt;/code&gt;, &lt;code&gt;record_code_area&lt;/code&gt;, &lt;code&gt;index_status&lt;/code&gt;, &lt;code&gt;reindex&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A live dashboard with token savings, donut charts, and session history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cce dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And dollar estimates fetched from live Anthropic pricing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cce savings &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Not Just Use Cursor's Built-in Indexing?
&lt;/h2&gt;

&lt;p&gt;CCE is editor-agnostic. One index works across Claude Code, VS Code, Cursor, Gemini CLI, and Codex. Your code never leaves your machine. And you get measurable savings with actual dollar amounts, not estimates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Languages Supported
&lt;/h2&gt;

&lt;p&gt;AST-aware chunking for Python, JavaScript, TypeScript, PHP, Go, Rust, and Java. Language-aware fallback for 40+ more (C, C++, Swift, Kotlin, Ruby, Haskell, and others). All text files are indexed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool &lt;span class="nb"&gt;install &lt;/span&gt;code-context-engine
&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
cce init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three lines. See your savings in 60 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/elara-labs/code-context-engine" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://elara-labs.github.io/code-context-engine/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://elara-labs.github.io/code-context-engine/blog/benchmark-fastapi.html" rel="noopener noreferrer"&gt;Benchmark&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;CCE is MIT licensed, free, and open source. Built by &lt;a href="https://github.com/elara-labs" rel="noopener noreferrer"&gt;Elara Labs&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
