<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhishek</title>
    <description>The latest articles on DEV Community by Abhishek (@abhishek_52e7f656ac8ec0e6).</description>
    <link>https://dev.to/abhishek_52e7f656ac8ec0e6</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3837969%2Ffed9d8aa-4c4f-47bf-85a0-bf76cee457fb.png</url>
      <title>DEV Community: Abhishek</title>
      <link>https://dev.to/abhishek_52e7f656ac8ec0e6</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhishek_52e7f656ac8ec0e6"/>
    <language>en</language>
    <item>
      <title>I was paying $200/month in wasted AI tokens. So I built a Rust context optimizer.</title>
      <dc:creator>Abhishek</dc:creator>
      <pubDate>Sun, 22 Mar 2026 05:43:45 +0000</pubDate>
      <link>https://dev.to/abhishek_52e7f656ac8ec0e6/i-was-paying-200month-in-wasted-ai-tokens-so-i-built-a-rust-context-optimizer-5g3e</link>
      <guid>https://dev.to/abhishek_52e7f656ac8ec0e6/i-was-paying-200month-in-wasted-ai-tokens-so-i-built-a-rust-context-optimizer-5g3e</guid>
      <description>&lt;p&gt;My Cursor bill last month: $340.&lt;/p&gt;

&lt;p&gt;I dug into the API logs. Over 60% of the tokens were being sent to the LLM were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Boilerplate I'd copied from Stack Overflow three years ago&lt;/li&gt;
&lt;li&gt;The same database helper function, 4 slightly different times&lt;/li&gt;
&lt;li&gt;An entire test file that has nothing to do with what I was asking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My AI tool was optimizing for &lt;strong&gt;similarity&lt;/strong&gt; -- and similarity is not the same as &lt;strong&gt;information&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with every AI coding tool
&lt;/h2&gt;

&lt;p&gt;Cursor, Copilot, Claude Code, Cody -- they all select context the same way:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embed your query&lt;/li&gt;
&lt;li&gt;Find the top-K similar chunks&lt;/li&gt;
&lt;li&gt;Stuff them into the context window until full&lt;/li&gt;
&lt;li&gt;Cut everything else&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query: "How does payment processing work?"

What your AI actually sees:
  auth.py       (similarity: 0.94)  &amp;lt;- useful
  auth_test.py  (similarity: 0.91)  &amp;lt;- copies auth logic
  auth_utils.py (similarity: 0.89)  &amp;lt;- more auth copies
  auth_v2.py    (similarity: 0.87)  &amp;lt;- even more auth
  ...
  payments.py   (similarity: 0.41)  &amp;lt;- NEVER LOADED, cut by budget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your AI is answering questions about payment processing without having read the payments file. It's hallucinating from auth code.&lt;/p&gt;

&lt;p&gt;This is not a prompt engineering problem. It's a math problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/juyterman1000/entroly" rel="noopener noreferrer"&gt;Entroly&lt;/a&gt; is an open-source context optimizer that intercepts requests between your IDE and the LLM. It replaces Top-K with three algorithms running in a Rust engine:&lt;/p&gt;

&lt;h3&gt;
  
  
  Algorithm 1: KKT-optimal knapsack bisection
&lt;/h3&gt;

&lt;p&gt;Context selection is a 0/1 knapsack problem. You have N code fragments with information scores and token costs. You want the maximum information within your budget.&lt;/p&gt;

&lt;p&gt;We solve this with KKT dual bisection -- O(30N) -- combined with submodular diversity selection that gives a (1-1/e) ~ 63% optimality guarantee.&lt;/p&gt;

&lt;p&gt;The diversity constraint is the key insight: instead of 4 versions of your auth module, you get auth + payments + DB schema + API layer -- one fragment from each area of your codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Algorithm 2: O(1) SimHash deduplication
&lt;/h3&gt;

&lt;p&gt;Every fragment gets a 64-bit SimHash fingerprint. Near-duplicate detection uses Hamming distance &amp;lt;= 3 via LSH buckets. Constant time, regardless of codebase size.&lt;/p&gt;

&lt;p&gt;Copy-pasted code, auto-generated boilerplate, and lightly-edited duplicates are removed &lt;strong&gt;before&lt;/strong&gt; they consume token budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Algorithm 3: PRISM -- online RL that learns what's actually useful
&lt;/h3&gt;

&lt;p&gt;After each LLM response, we measure how much of the injected context the model actually referenced (trigram + identifier overlap scoring). &lt;/p&gt;

&lt;p&gt;This feeds a REINFORCE loop. The dual variable from the forward knapsack constraint serves as a per-item baseline -- so the RL gradient is guaranteed consistent with the selection math. Weights update via a spectral natural gradient on the 4x4 gradient covariance.&lt;/p&gt;

&lt;p&gt;In plain English: the more you use it, the better it gets at choosing what to include.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;On a 50K LOC Python/TypeScript monorepo, one month of usage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Naive stuffing&lt;/th&gt;
&lt;th&gt;Top-K (Cody-style)&lt;/th&gt;
&lt;th&gt;Entroly&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per request&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;-18%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-78%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Files represented&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency overhead&lt;/td&gt;
&lt;td&gt;0ms&lt;/td&gt;
&lt;td&gt;~40ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt;10ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly API cost&lt;/td&gt;
&lt;td&gt;$340&lt;/td&gt;
&lt;td&gt;$280&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$75&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Your AI sees your entire codebase. You spend 78% less. The Rust engine adds under 10ms per request.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to try it (60 seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;entroly

&lt;span class="c"&gt;# For Cursor / Claude Code (MCP server):&lt;/span&gt;
entroly init
&lt;span class="c"&gt;# Generates .cursor/mcp.json automatically&lt;/span&gt;

&lt;span class="c"&gt;# For anything else (transparent HTTP proxy):&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;entroly[proxy]
entroly proxy &lt;span class="nt"&gt;--quality&lt;/span&gt; balanced
&lt;span class="c"&gt;# Point your AI tool to http://localhost:9377/v1&lt;/span&gt;

&lt;span class="c"&gt;# See what it's doing:&lt;/span&gt;
entroly demo       &lt;span class="c"&gt;# before/after comparison on your actual project&lt;/span&gt;
entroly dashboard  &lt;span class="c"&gt;# live metrics at localhost:9378&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It auto-indexes your codebase via &lt;code&gt;git ls-files&lt;/code&gt;, builds dependency graphs, and starts working immediately. No YAML, no config files, no embeddings database to set up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bonuses you get for free
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Built-in security scanning.&lt;/strong&gt; 55 SAST rules (SQL injection, hardcoded secrets, command injection, 8 CWE categories) run on selected context before your AI sees it. If you're about to ask your AI to modify sensitive code, it flags it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entropy anomaly detection.&lt;/strong&gt; We run robust MAD-based Z-scores across directory groups to flag code that's statistically unusual compared to its neighbors -- copy-paste errors, dead stubs, and suspicious auth deviations surface without any LLM call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codebase health grades.&lt;/strong&gt; Clone detection, dead symbol finder, god file detection. Run &lt;code&gt;entroly health&lt;/code&gt; to get an A-F grade for your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's all open source (MIT)
&lt;/h2&gt;

&lt;p&gt;The entire Rust core (19 modules, PyO3 bridge) is on GitHub:&lt;/p&gt;

&lt;p&gt;-&amp;gt; &lt;strong&gt;&lt;a href="https://github.com/juyterman1000/entroly" rel="noopener noreferrer"&gt;https://github.com/juyterman1000/entroly&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're building AI dev tools, the knapsack selection, SimHash dedup, and dependency graph modules are all designed to be composable -- fork it and strip for parts.&lt;/p&gt;

&lt;p&gt;PRs, issues, and savage code reviews all welcome. &lt;/p&gt;

&lt;p&gt;What's your current strategy for context management with your AI coding tool? I'd love to hear what you're using in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
