<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matteo Fiorini</title>
    <description>The latest articles on DEV Community by Matteo Fiorini (@raxyl00).</description>
    <link>https://dev.to/raxyl00</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3986218%2Ff7b7da36-fffb-4003-bff9-2c42666e6856.jpeg</url>
      <title>DEV Community: Matteo Fiorini</title>
      <link>https://dev.to/raxyl00</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raxyl00"/>
    <language>en</language>
    <item>
      <title>How I Built a Zero-Dependency Token Compressor for AI Coding Agents (During My High School Exams)</title>
      <dc:creator>Matteo Fiorini</dc:creator>
      <pubDate>Mon, 15 Jun 2026 20:37:04 +0000</pubDate>
      <link>https://dev.to/raxyl00/how-i-built-a-zero-dependency-token-compressor-for-ai-coding-agents-during-my-high-school-exams-3ihh</link>
      <guid>https://dev.to/raxyl00/how-i-built-a-zero-dependency-token-compressor-for-ai-coding-agents-during-my-high-school-exams-3ihh</guid>
      <description>&lt;p&gt;as developers, we are spending more and more time working alongside AI coding agents like &lt;strong&gt;Cursor&lt;/strong&gt;, &lt;strong&gt;Claude Code&lt;/strong&gt;, &lt;strong&gt;GitHub Copilot&lt;/strong&gt;, &lt;strong&gt;Windsurf&lt;/strong&gt;, or &lt;strong&gt;Cline&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;But as your session grows, you quickly run into two major problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context Window Inflation&lt;/strong&gt;: Long-running loops, verbose model reasoning, and unfiltered terminal log dumps clog the context window, causing the LLM to get "lost in the middle" and start hallucinating.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial Overhead&lt;/strong&gt;: Large context windows mean higher token usage, which translates directly to higher API costs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve this, I built &lt;strong&gt;TITAN (Token Intelligence Through Agent Narrowing)&lt;/strong&gt;: a universal, zero-dependency CLI framework designed to compress AI agent token consumption by &lt;strong&gt;70% to 85%&lt;/strong&gt; without degrading reasoning quality.&lt;/p&gt;

&lt;p&gt;And to make things interesting, I wrote and shipped it this week entirely on my own, right in the middle of my high school final exams (&lt;em&gt;la maturità&lt;/em&gt; here in Italy). &lt;/p&gt;

&lt;p&gt;Here is how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Philosophy: Multi-Layer Compression
&lt;/h2&gt;

&lt;p&gt;TITAN approaches token optimization not as a single post-processing step, but as three orthogonal, multiplicative layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Savings = 1 - ( (1 - L1_Savings) * (1 - L2_Savings) * (1 - L3_Savings) )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 1: Linguistic Compression (Caveman Engine)
&lt;/h3&gt;

&lt;p&gt;Instead of letting the LLM output standard verbose English prose (pleasantries, hedging, filler words, technical narrations), the &lt;strong&gt;Caveman Engine&lt;/strong&gt; instructs the model to use a dense, telegraphese grammar:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strips filler/hedging&lt;/strong&gt;: &lt;code&gt;basically&lt;/code&gt;, &lt;code&gt;actually&lt;/code&gt;, &lt;code&gt;likely&lt;/code&gt;, &lt;code&gt;probably&lt;/code&gt; $\to$ removed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strips articles&lt;/strong&gt;: &lt;code&gt;the&lt;/code&gt;, &lt;code&gt;a&lt;/code&gt;, &lt;code&gt;an&lt;/code&gt; $\to$ removed (when safe).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fragments OK&lt;/strong&gt;: subject/auxiliary drops $\to$ e.g., &lt;code&gt;"Component re-renders"&lt;/code&gt; instead of &lt;code&gt;"The component is re-rendering"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserves Sacred Tokens&lt;/strong&gt;: Code blocks, URLs, file paths, and exact technical names are protected and left untouched.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Structural Code Compression (Ponytail Lazy Ladder)
&lt;/h3&gt;

&lt;p&gt;Before the agent writes a single line of code, it must traverse a &lt;strong&gt;6-rung logical ladder&lt;/strong&gt; to guarantee the laziest, most minimal solution:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;YAGNI&lt;/strong&gt;: Does this feature actually need to exist right now? If not, skip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stdlib&lt;/strong&gt;: Can Node.js/JS native stdlib do it? If yes, use it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native&lt;/strong&gt;: Is there a platform native API? Use it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Existing&lt;/strong&gt;: Is there an already installed package? Don't add a new npm dependency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One Line&lt;/strong&gt;: Can it be written as a single line? Inline it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimum&lt;/strong&gt;: Only then, write the absolute minimum working code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every deliberate simplification is documented inline: &lt;code&gt;// ponytail: &amp;lt;ceiling&amp;gt;, &amp;lt;upgrade path&amp;gt;&lt;/code&gt; (e.g. &lt;code&gt;// ponytail: local memory cache, use Redis if multi-node setup is required&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Contextual Compression (CLI Utilities)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Files&lt;/strong&gt;: Static documentation files (like &lt;code&gt;CLAUDE.md&lt;/code&gt;) are compressed post-hoc to strip prose while keeping code conventions exact, saving up to 45% input tokens on every turn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terminal Stream Filtering&lt;/strong&gt;: Pipes build/test logs to strip Vite/Webpack startup noise, husky banners, and contract large stack traces down to the error header + first relevant application frame.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run build 2&amp;gt;&amp;amp;1 | titan filter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Zero-Dependency Rule
&lt;/h2&gt;

&lt;p&gt;Following the structural (L2) rule of using the standard library, TITAN has &lt;strong&gt;zero external npm dependencies&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;It uses Node.js native features (&lt;code&gt;fs&lt;/code&gt;, &lt;code&gt;path&lt;/code&gt;, &lt;code&gt;readline&lt;/code&gt;, &lt;code&gt;child_process&lt;/code&gt;, &lt;code&gt;https&lt;/code&gt;) for everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The YAML frontmatter parser is implemented as an indentation-aware state machine that handles quoted strings, list arrays, and multiline block scalars (&lt;code&gt;|&lt;/code&gt; and &lt;code&gt;&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The test runner uses Node's native &lt;code&gt;node:test&lt;/code&gt; and &lt;code&gt;node:assert&lt;/code&gt; modules.&lt;/li&gt;
&lt;li&gt;System commands execute via native subprocess spawns.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Measuring Usable Intelligence Density (UID)
&lt;/h2&gt;

&lt;p&gt;To verify that compressing prompts doesn't degrade the AI's coding and reasoning capabilities, I built an evaluation harness into TITAN to measure &lt;strong&gt;Usable Intelligence Density (UID)&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;$$\text{UID} = \frac{\text{Avg Accuracy \%}}{\text{Avg Total Tokens}} \times 1000$$&lt;/p&gt;

&lt;p&gt;Here is how the variants perform under mock and empirical LLM runs over a 5-task suite (Coding, Debugging, Logic, Refactoring, and Code Review):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Avg Accuracy&lt;/th&gt;
&lt;th&gt;Avg In Tok&lt;/th&gt;
&lt;th&gt;Avg Out Tok&lt;/th&gt;
&lt;th&gt;Avg Tot Tok&lt;/th&gt;
&lt;th&gt;UID (Density)&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Baseline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;198&lt;/td&gt;
&lt;td&gt;248&lt;/td&gt;
&lt;td&gt;403.2&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Caveman&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;120&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;198&lt;/td&gt;
&lt;td&gt;505.1&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ponytail&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;86%&lt;/td&gt;
&lt;td&gt;115&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;182&lt;/td&gt;
&lt;td&gt;472.5&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TITAN Balanced&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;1500&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;1580&lt;/td&gt;
&lt;td&gt;63.3&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TITAN Lite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;425&lt;/td&gt;
&lt;td&gt;91&lt;/td&gt;
&lt;td&gt;516&lt;/td&gt;
&lt;td&gt;193.8&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TITAN Aggressive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;79%&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;450&lt;/td&gt;
&lt;td&gt;175.7&lt;/td&gt;
&lt;td&gt;⚠ Degraded&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lite / Balanced&lt;/strong&gt;: Achieve a flat 100% accuracy while maximizing density.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggressive&lt;/strong&gt;: Telegraphic mode. Maximizes token efficiency, but logical reasoning begins to degrade slightly on highly abstract deduction tasks.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Note: The large input token count for the full &lt;code&gt;TITAN&lt;/code&gt; prompt reflects the cost of loading the full master ruleset. The &lt;code&gt;titan_lite&lt;/code&gt; variant balances prompt size and output compression beautifully.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;You can install TITAN globally from npm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; titan-agent-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then initialize the ruleset for your editor. For instance, to generate Cursor rules (&lt;code&gt;.cursor/rules/titan.mdc&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Standard balanced configuration&lt;/span&gt;
titan init &lt;span class="nt"&gt;--agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cursor

&lt;span class="c"&gt;# Or a lightweight prompt ruleset (~620 tokens)&lt;/span&gt;
titan init &lt;span class="nt"&gt;--agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cursor &lt;span class="nt"&gt;--lite&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the native unit tests locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;titan &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And to scan your codebase for active technical debt ponytail comments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;titan debt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Open Source &amp;amp; Contributions
&lt;/h2&gt;

&lt;p&gt;TITAN is fully open source. I’d love to get your thoughts, contributions, or a star on GitHub! &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repository&lt;/strong&gt;: &lt;a href="https://github.com/Raxyl00/titan-agent-cli" rel="noopener noreferrer"&gt;github.com/Raxyl00/titan-agent-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NPM Package&lt;/strong&gt;: &lt;a href="https://www.npmjs.com/package/titan-agent-cli" rel="noopener noreferrer"&gt;npmjs.com/package/titan-agent-cli&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have any feedback on the standard library YAML parser or ideas on expanding adapters for new IDEs, let me know in the comments below!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>node</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
