DEV Community

Cover image for How I Built a Zero-Dependency Token Compressor for AI Coding Agents (During My High School Exams)
Matteo Fiorini
Matteo Fiorini

Posted on

How I Built a Zero-Dependency Token Compressor for AI Coding Agents (During My High School Exams)

as developers, we are spending more and more time working alongside AI coding agents like Cursor, Claude Code, GitHub Copilot, Windsurf, or Cline.

But as your session grows, you quickly run into two major problems:

  1. Context Window Inflation: Long-running loops, verbose model reasoning, and unfiltered terminal log dumps clog the context window, causing the LLM to get "lost in the middle" and start hallucinating.
  2. Financial Overhead: Large context windows mean higher token usage, which translates directly to higher API costs.

To solve this, I built TITAN (Token Intelligence Through Agent Narrowing): a universal, zero-dependency CLI framework designed to compress AI agent token consumption by 70% to 85% without degrading reasoning quality.

And to make things interesting, I wrote and shipped it this week entirely on my own, right in the middle of my high school final exams (la maturità here in Italy).

Here is how it works under the hood.


The Core Philosophy: Multi-Layer Compression

TITAN approaches token optimization not as a single post-processing step, but as three orthogonal, multiplicative layers:

Total Savings = 1 - ( (1 - L1_Savings) * (1 - L2_Savings) * (1 - L3_Savings) )
Enter fullscreen mode Exit fullscreen mode

Layer 1: Linguistic Compression (Caveman Engine)

Instead of letting the LLM output standard verbose English prose (pleasantries, hedging, filler words, technical narrations), the Caveman Engine instructs the model to use a dense, telegraphese grammar:

  • Strips filler/hedging: basically, actually, likely, probably $\to$ removed.
  • Strips articles: the, a, an $\to$ removed (when safe).
  • Fragments OK: subject/auxiliary drops $\to$ e.g., "Component re-renders" instead of "The component is re-rendering".
  • Preserves Sacred Tokens: Code blocks, URLs, file paths, and exact technical names are protected and left untouched.

Layer 2: Structural Code Compression (Ponytail Lazy Ladder)

Before the agent writes a single line of code, it must traverse a 6-rung logical ladder to guarantee the laziest, most minimal solution:

  1. YAGNI: Does this feature actually need to exist right now? If not, skip.
  2. Stdlib: Can Node.js/JS native stdlib do it? If yes, use it.
  3. Native: Is there a platform native API? Use it.
  4. Existing: Is there an already installed package? Don't add a new npm dependency.
  5. One Line: Can it be written as a single line? Inline it.
  6. Minimum: Only then, write the absolute minimum working code.

Every deliberate simplification is documented inline: // ponytail: <ceiling>, <upgrade path> (e.g. // ponytail: local memory cache, use Redis if multi-node setup is required).

Layer 3: Contextual Compression (CLI Utilities)

  • Memory Files: Static documentation files (like CLAUDE.md) are compressed post-hoc to strip prose while keeping code conventions exact, saving up to 45% input tokens on every turn.
  • Terminal Stream Filtering: Pipes build/test logs to strip Vite/Webpack startup noise, husky banners, and contract large stack traces down to the error header + first relevant application frame.
npm run build 2>&1 | titan filter
Enter fullscreen mode Exit fullscreen mode

The Zero-Dependency Rule

Following the structural (L2) rule of using the standard library, TITAN has zero external npm dependencies.

It uses Node.js native features (fs, path, readline, child_process, https) for everything:

  • The YAML frontmatter parser is implemented as an indentation-aware state machine that handles quoted strings, list arrays, and multiline block scalars (| and >).
  • The test runner uses Node's native node:test and node:assert modules.
  • System commands execute via native subprocess spawns.

Measuring Usable Intelligence Density (UID)

To verify that compressing prompts doesn't degrade the AI's coding and reasoning capabilities, I built an evaluation harness into TITAN to measure Usable Intelligence Density (UID):

$$\text{UID} = \frac{\text{Avg Accuracy \%}}{\text{Avg Total Tokens}} \times 1000$$

Here is how the variants perform under mock and empirical LLM runs over a 5-task suite (Coding, Debugging, Logic, Refactoring, and Code Review):

Variant Avg Accuracy Avg In Tok Avg Out Tok Avg Tot Tok UID (Density) Status
Baseline 100% 50 198 248 403.2 Reliable
Caveman 100% 120 78 198 505.1 Reliable
Ponytail 86% 115 67 182 472.5 Reliable
TITAN Balanced 100% 1500 80 1580 63.3 Reliable
TITAN Lite 100% 425 91 516 193.8 Reliable
TITAN Aggressive 79% 400 50 450 175.7 ⚠ Degraded
  • Lite / Balanced: Achieve a flat 100% accuracy while maximizing density.
  • Aggressive: Telegraphic mode. Maximizes token efficiency, but logical reasoning begins to degrade slightly on highly abstract deduction tasks.
  • Note: The large input token count for the full TITAN prompt reflects the cost of loading the full master ruleset. The titan_lite variant balances prompt size and output compression beautifully.

Getting Started

You can install TITAN globally from npm:

npm install -g titan-agent-cli
Enter fullscreen mode Exit fullscreen mode

Then initialize the ruleset for your editor. For instance, to generate Cursor rules (.cursor/rules/titan.mdc):

# Standard balanced configuration
titan init --agent=cursor

# Or a lightweight prompt ruleset (~620 tokens)
titan init --agent=cursor --lite
Enter fullscreen mode Exit fullscreen mode

To run the native unit tests locally:

titan test
Enter fullscreen mode Exit fullscreen mode

And to scan your codebase for active technical debt ponytail comments:

titan debt
Enter fullscreen mode Exit fullscreen mode

Open Source & Contributions

TITAN is fully open source. I’d love to get your thoughts, contributions, or a star on GitHub!

If you have any feedback on the standard library YAML parser or ideas on expanding adapters for new IDEs, let me know in the comments below!

Top comments (1)

Collapse
 
alexshev profile image
Alex Shev

The zero-dependency constraint is a good fit for agent tooling. If a compressor is supposed to run inside many repos and workflows, every dependency becomes another thing the agent has to install, trust, and explain.

The hard part is not only reducing tokens, but preserving the cues that let the next model action stay grounded.