<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maurizio-L</title>
    <description>The latest articles on DEV Community by Maurizio-L (@mauriziol).</description>
    <link>https://dev.to/mauriziol</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948290%2Ffbbd586a-13e2-4bdb-a5c9-f99f907be057.png</url>
      <title>DEV Community: Maurizio-L</title>
      <link>https://dev.to/mauriziol</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mauriziol"/>
    <language>en</language>
    <item>
      <title>Everyone compresses their agent's context. Nobody measures what it forgets.</title>
      <dc:creator>Maurizio-L</dc:creator>
      <pubDate>Thu, 28 May 2026 21:33:47 +0000</pubDate>
      <link>https://dev.to/mauriziol/everyone-compresses-their-agents-context-nobody-measures-what-it-forgets-5gp3</link>
      <guid>https://dev.to/mauriziol/everyone-compresses-their-agents-context-nobody-measures-what-it-forgets-5gp3</guid>
      <description>&lt;p&gt;&lt;strong&gt;How we benchmarked context quality across Anthropic, OpenAI, and &lt;a href="https://promptolian.com" rel="noopener noreferrer"&gt;Promptolian&lt;/a&gt; — a transparent proxy that compresses agent context without losing facts — and found the sweet spot nobody talks about&lt;/strong&gt;&lt;/p&gt;




&lt;ul&gt;
&lt;li&gt;The problem nobody talks about&lt;/li&gt;
&lt;li&gt;The U-curve most teams never see&lt;/li&gt;
&lt;li&gt;What we built instead&lt;/li&gt;
&lt;li&gt;The second problem: tool schema re-sending&lt;/li&gt;
&lt;li&gt;The business case&lt;/li&gt;
&lt;li&gt;Try it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;You ship an AI agent. It works great on the first 10 messages. By message 30, it's asking you to repeat the database URL. By message 50, it's recommending the wrong config value — one it hallucinated because the real one got summarised away three turns back.&lt;/p&gt;

&lt;p&gt;This isn't a model problem. The problem is context management.&lt;/p&gt;

&lt;p&gt;Most agent frameworks hand off context management to the provider — Anthropic's built-in summarisation, or OpenAI's equivalent. The provider, with no way to know which facts matter, does what it's built to do: compress aggressively. 98–99% compression. Everything gets squashed into a summary paragraph.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;your agent forgets things&lt;/strong&gt;. And every time it forgets something important, a human has to step in. That's rework. And rework costs more than tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  The U-curve most teams never see
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Total cost = context tokens + rework from fact loss.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everyone optimises the left side. But when the agent loses facts, someone has to catch the error, re-explain the context, and redo the work. That's not free.&lt;/p&gt;

&lt;p&gt;Plot quality score against compression rate and you get this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ntdzw42ol80j51wpygk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ntdzw42ol80j51wpygk.png" alt="Figure 1 — Context quality score vs compression rate"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://promptolian.com/ucurve.html" rel="noopener noreferrer"&gt;Interactive version: promptolian.com/ucurve.html&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Promptolian (22%) stays in the green zone (≥ 4.0). Both provider built-ins land in the red zone at 99% compression — because LLM summarisers don't know which facts matter later. They see &lt;code&gt;postgres://db.prod/main&lt;/code&gt; and write "the database connection was discussed." Accurate. Useless.&lt;/p&gt;

&lt;p&gt;Factory.ai measured this (May 2026):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Context quality&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic built-in&lt;/td&gt;
&lt;td&gt;3.44 / 5&lt;/td&gt;
&lt;td&gt;98.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI built-in&lt;/td&gt;
&lt;td&gt;3.35 / 5&lt;/td&gt;
&lt;td&gt;99.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The quality gap has a direct cost. Figure 2 shows total monthly spend (API tokens + engineer time fixing context failures) for a solo developer:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiecuzizo3hiu5j4wm2b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiecuzizo3hiu5j4wm2b.png" alt="Figure 2 — Monthly cost: API tokens + engineer rework time"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Assumptions:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;100 sessions/month&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;50 calls/session · 8K context tokens&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;$3/MTok input (Claude Sonnet 4)&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;$100/hr engineer rate&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At zero debugging time, Anthropic built-in wins (99% token savings). &lt;strong&gt;Above 3.5 minutes per failure, Promptolian is cheaper.&lt;/strong&gt; That threshold is low — it's the agent asking you to re-confirm the deployment target, you correcting it, losing your train of thought.&lt;/p&gt;

&lt;p&gt;The cost minimum isn't at maximum compression. &lt;strong&gt;It's at 22%, not 99%.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What we built instead
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://promptolian.com" rel="noopener noreferrer"&gt;Promptolian&lt;/a&gt; is a transparent proxy that sits between your agent and the Anthropic API. You change one line (&lt;code&gt;base_url&lt;/code&gt;) — no changes to agent logic.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;not all turns are equal&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First 2 turns&lt;/strong&gt; — task framing and constraints. Losing these is catastrophic. Kept verbatim.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Last 4 turns&lt;/strong&gt; — current working state. Compress these and you break continuity. Kept verbatim.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Everything in between&lt;/strong&gt; — repeated phrasing, confirmed values, filler. Safe to compress.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the middle is compressed, the edges are not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HEAD   → first 2 turns  → VERBATIM
MIDDLE → turns 3 to N-4 → WEIGHTED + COMPRESSED
TAIL   → last 4 turns   → VERBATIM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not all middle turns are treated equally either. Each turn gets a score based on how much new information it adds — new entities, new vocabulary, delta from what came before. Turns with high information density survive compression; pure acknowledgements ("ok", "noted", "sounds good") and reformulations of things already said are pruned first.&lt;/p&gt;

&lt;p&gt;The compression engine is rule-based — no LLM. It encodes repeated entities into a local registry (&lt;code&gt;postgres://db.prod/main&lt;/code&gt; → &lt;code&gt;§E1&lt;/code&gt;) and expands them back before each API call. Facts aren't summarised — they're encoded. Nothing is destroyed.&lt;/p&gt;

&lt;p&gt;Benchmark across 25 sessions, 5 task domains, same Factory.ai 6-dimension methodology:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Promptolian&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4.26 / 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic built-in&lt;/td&gt;
&lt;td&gt;3.44 / 5&lt;/td&gt;
&lt;td&gt;98.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI built-in&lt;/td&gt;
&lt;td&gt;3.35 / 5&lt;/td&gt;
&lt;td&gt;99.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Better context quality at 22% compression vs 99%.&lt;/p&gt;




&lt;h2&gt;
  
  
  The second problem: your agent re-sends the same tool schema every call
&lt;/h2&gt;

&lt;p&gt;Every API call re-sends the full tool schema — even if nothing changed. For 5 tools that's ~600 tokens wasted per call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Call 1: [system] + [tools: 600 tok] + [message]  → full price
Call 2: [system] + [tools: 600 tok] + [message]  → full price again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Anthropic's prompt cache solves this — but only if you manually add &lt;code&gt;cache_control&lt;/code&gt; blocks. Most frameworks don't. The proxy does it automatically.&lt;/p&gt;

&lt;p&gt;Assumptions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;500 calls/day · 5 tools · ~120 tokens each = 600 tool tokens/call&lt;/li&gt;
&lt;li&gt;30 days/month → 9M tool tokens/month → &lt;strong&gt;$27.00&lt;/strong&gt; without caching&lt;/li&gt;
&lt;li&gt;With Anthropic prompt cache (10% on hits) → &lt;strong&gt;$2.70&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Saving: $24.30/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The business case
&lt;/h2&gt;

&lt;p&gt;Context failures are rework events. The fact-loss rates derived from benchmark scores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Fact-loss rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Promptolian&lt;/td&gt;
&lt;td&gt;4.26/5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14.8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic built-in&lt;/td&gt;
&lt;td&gt;3.44/5&lt;/td&gt;
&lt;td&gt;31.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI built-in&lt;/td&gt;
&lt;td&gt;3.35/5&lt;/td&gt;
&lt;td&gt;33.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If fixing a context failure costs your team more than &lt;strong&gt;3.5 minutes&lt;/strong&gt;, Promptolian is cheaper than Anthropic built-in in total. A context failure that reaches code review can cost an hour.&lt;/p&gt;

&lt;p&gt;Tool caching adds ~$24/month on top. Token savings show up in your API bill. Rework savings don't — but that's where the ROI lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"promptolian[proxy]"&lt;/span&gt;

promptolian proxy            &lt;span class="c"&gt;# tool caching only — 1 min to production&lt;/span&gt;
promptolian proxy &lt;span class="nt"&gt;--compress&lt;/span&gt; &lt;span class="c"&gt;# + context history compression&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3002&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# only change needed
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or skip self-hosting with the cloud proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://proxy.promptolian.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Promptolian-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pk_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full docs: &lt;a href="https://promptolian.com/docs.html" rel="noopener noreferrer"&gt;promptolian.com/docs.html&lt;/a&gt; · GitHub: &lt;a href="https://github.com/Maurizio-L/promptolian-public" rel="noopener noreferrer"&gt;github.com/Maurizio-L/promptolian-public&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Methodology: 25 sessions × 5 task domains, Factory.ai 6-dimension probe scoring (Accuracy, Context, Artifact, Completeness, Continuity, Instruction). Anthropic/OpenAI baselines from Factory.ai May 2026. Promptolian: internal benchmark, same methodology. Validation run: 4.19/5 (second 25-session run after entity-encoding fix). Fact-loss rate = 1 − quality/5.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
