I was paying $200/month in wasted AI tokens. So I built a Rust context optimizer.

Abhishek — Sun, 22 Mar 2026 05:43:45 +0000

My Cursor bill last month: $340.

I dug into the API logs. Over 60% of the tokens were being sent to the LLM were:

Boilerplate I'd copied from Stack Overflow three years ago
The same database helper function, 4 slightly different times
An entire test file that has nothing to do with what I was asking

My AI tool was optimizing for similarity -- and similarity is not the same as information.

The problem with every AI coding tool

Cursor, Copilot, Claude Code, Cody -- they all select context the same way:

Embed your query
Find the top-K similar chunks
Stuff them into the context window until full
Cut everything else

The result?

Query: "How does payment processing work?"

What your AI actually sees:
  auth.py       (similarity: 0.94)  <- useful
  auth_test.py  (similarity: 0.91)  <- copies auth logic
  auth_utils.py (similarity: 0.89)  <- more auth copies
  auth_v2.py    (similarity: 0.87)  <- even more auth
  ...
  payments.py   (similarity: 0.41)  <- NEVER LOADED, cut by budget

Your AI is answering questions about payment processing without having read the payments file. It's hallucinating from auth code.

This is not a prompt engineering problem. It's a math problem.

What we built

Entroly is an open-source context optimizer that intercepts requests between your IDE and the LLM. It replaces Top-K with three algorithms running in a Rust engine:

Algorithm 1: KKT-optimal knapsack bisection

Context selection is a 0/1 knapsack problem. You have N code fragments with information scores and token costs. You want the maximum information within your budget.

We solve this with KKT dual bisection -- O(30N) -- combined with submodular diversity selection that gives a (1-1/e) ~ 63% optimality guarantee.

The diversity constraint is the key insight: instead of 4 versions of your auth module, you get auth + payments + DB schema + API layer -- one fragment from each area of your codebase.

Algorithm 2: O(1) SimHash deduplication

Every fragment gets a 64-bit SimHash fingerprint. Near-duplicate detection uses Hamming distance <= 3 via LSH buckets. Constant time, regardless of codebase size.

Copy-pasted code, auto-generated boilerplate, and lightly-edited duplicates are removed before they consume token budget.

Algorithm 3: PRISM -- online RL that learns what's actually useful

After each LLM response, we measure how much of the injected context the model actually referenced (trigram + identifier overlap scoring).

This feeds a REINFORCE loop. The dual variable from the forward knapsack constraint serves as a per-item baseline -- so the RL gradient is guaranteed consistent with the selection math. Weights update via a spectral natural gradient on the 4x4 gradient covariance.

In plain English: the more you use it, the better it gets at choosing what to include.

The numbers

On a 50K LOC Python/TypeScript monorepo, one month of usage:

Metric	Naive stuffing	Top-K (Cody-style)	Entroly
Tokens per request	baseline	-18%	-78%
Files represented	8	11	100%
Latency overhead	0ms	~40ms	<10ms
Monthly API cost	$340	$280	$75

Your AI sees your entire codebase. You spend 78% less. The Rust engine adds under 10ms per request.

How to try it (60 seconds)

pip install entroly

# For Cursor / Claude Code (MCP server):
entroly init
# Generates .cursor/mcp.json automatically

# For anything else (transparent HTTP proxy):
pip install entroly[proxy]
entroly proxy --quality balanced
# Point your AI tool to http://localhost:9377/v1

# See what it's doing:
entroly demo       # before/after comparison on your actual project
entroly dashboard  # live metrics at localhost:9378

It auto-indexes your codebase via git ls-files, builds dependency graphs, and starts working immediately. No YAML, no config files, no embeddings database to set up.

The bonuses you get for free

Built-in security scanning. 55 SAST rules (SQL injection, hardcoded secrets, command injection, 8 CWE categories) run on selected context before your AI sees it. If you're about to ask your AI to modify sensitive code, it flags it.

Entropy anomaly detection. We run robust MAD-based Z-scores across directory groups to flag code that's statistically unusual compared to its neighbors -- copy-paste errors, dead stubs, and suspicious auth deviations surface without any LLM call.

Codebase health grades. Clone detection, dead symbol finder, god file detection. Run entroly health to get an A-F grade for your project.

It's all open source (MIT)

The entire Rust core (19 modules, PyO3 bridge) is on GitHub:

-> https://github.com/juyterman1000/entroly

If you're building AI dev tools, the knapsack selection, SimHash dedup, and dependency graph modules are all designed to be composable -- fork it and strip for parts.

PRs, issues, and savage code reviews all welcome.

What's your current strategy for context management with your AI coding tool? I'd love to hear what you're using in the comments.

DEV Community: Abhishek