<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Maurizio-L</title>
    <description>The latest articles on DEV Community by Maurizio-L (@mauriziol).</description>
    <link>https://dev.to/mauriziol</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3948290%2F045842ab-1de8-4d14-9416-6bb53e241ebb.jpeg</url>
      <title>DEV Community: Maurizio-L</title>
      <link>https://dev.to/mauriziol</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mauriziol"/>
    <language>en</language>
    <item>
      <title>We measured what 10 tools 1,000 calls/day actually costs in AI agents</title>
      <dc:creator>Maurizio-L</dc:creator>
      <pubDate>Sat, 23 May 2026 22:11:06 +0000</pubDate>
      <link>https://dev.to/mauriziol/we-measured-what-10-tools-x-1000-callsday-actually-costs-in-ai-agents-39eh</link>
      <guid>https://dev.to/mauriziol/we-measured-what-10-tools-x-1000-callsday-actually-costs-in-ai-agents-39eh</guid>
      <description>&lt;h1&gt;
  
  
  We measured what 10 tools × 1,000 calls/day actually costs. Here's the data.
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Posted to r/ClaudeAI · r/LocalLLaMA · Hacker News&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;When you build an AI agent, you give it tools. Search the web. Read a file. Call an API. Query a database.&lt;/p&gt;

&lt;p&gt;Each tool needs a description — a JSON block that tells the model what the tool does and what parameters it takes. Here's what a single tool looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_web"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Search the web for recent information"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The search query"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"max_results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"integer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Maximum number of results to return"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single definition is about 80 tokens.&lt;/p&gt;

&lt;p&gt;If your agent has 10 tools, you're sending ~800–1,200 tokens of tool definitions &lt;strong&gt;on every API call&lt;/strong&gt;. Not once. Every call.&lt;/p&gt;




&lt;h2&gt;
  
  
  The actual numbers
&lt;/h2&gt;

&lt;p&gt;We ran 1,000 simulated agent sessions across four agent sizes. Pricing at Claude Sonnet 4 input ($3 / 1M tokens).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;th&gt;Tokens / call&lt;/th&gt;
&lt;th&gt;1k calls/day&lt;/th&gt;
&lt;th&gt;Cost / month&lt;/th&gt;
&lt;th&gt;Cost / year&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;~600&lt;/td&gt;
&lt;td&gt;600k tok/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$54&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$657&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;~1,200&lt;/td&gt;
&lt;td&gt;1.2M tok/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$108&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,314&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;~2,400&lt;/td&gt;
&lt;td&gt;2.4M tok/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$216&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2,628&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;~6,000&lt;/td&gt;
&lt;td&gt;6M tok/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$540&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$6,570&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At 10k calls/day (not unusual for a production agent), multiply those numbers by 10.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this doesn't go away
&lt;/h2&gt;

&lt;p&gt;The obvious answer is: Anthropic has prompt caching. Use that.&lt;/p&gt;

&lt;p&gt;Prompt caching helps, but:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cached input tokens are still billed&lt;/strong&gt; — at 10% of normal price. Not free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache TTL is 5 minutes.&lt;/strong&gt; If your sessions are longer than 5 minutes apart, you pay full price.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache invalidates on any change.&lt;/strong&gt; If you add a tool, update a description, or rotate an API key in a tool — full price again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So even with caching, you're paying for tool tokens. And most agents don't have caching set up at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Promptolian&lt;/strong&gt; is a compression layer that sits between your code and any LLM API. You call it once at startup — everything else stays unchanged. It intercepts every API call, compresses what it can, and forwards the request. No proxy, no routing change, no new infrastructure.&lt;/p&gt;

&lt;p&gt;It has three independent compression layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1 — Prompt compression&lt;/strong&gt;&lt;br&gt;
Replaces verbose patterns with compact equivalents before the text reaches the model. "You are an expert Python developer. Please write a function..." becomes "§EXP py developer. ACT write FN...". Runs locally in under 1ms. ~20% savings on typical prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2 — Context engine&lt;/strong&gt;&lt;br&gt;
As a conversation grows, old turns get expensive. Promptolian summarises older messages and keeps only the most relevant recent turns — using a layout that works with how LLMs weight context. Up to 52.9% savings on long sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3 — Tool schema compiler&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the one that surprised us. It works in two phases:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Call 1 — compact DSL&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Instead of the full JSON, the model receives a function-signature format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;search_web&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Search the web for recent information
&lt;/span&gt;&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;utf&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# Read a local file
&lt;/span&gt;&lt;span class="nf"&gt;call_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;GET&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# HTTP request
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same information. About 40 tokens instead of 120. &lt;strong&gt;~69% smaller.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Call 2 onward — reference only&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The model already saw the full definitions on call 1. They're in the conversation context. From call 2, you can send:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOLS:[search_web,read_file,call_api]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;~3 tokens. 97% smaller.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model understands this because the definitions are in its context window from the previous turn. It knows what &lt;code&gt;search_web&lt;/code&gt; does. You don't need to re-explain it.&lt;/p&gt;

&lt;p&gt;All three layers are deterministic — no LLM calls, no data sent anywhere, sub-millisecond latency. The tool is open source and self-hostable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark results across 20 prompt types
&lt;/h2&gt;

&lt;p&gt;We ran our prompt compression layer against 20 real-world prompts (system prompts, user instructions, domain-specific text):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Median CR&lt;/th&gt;
&lt;th&gt;Mean CR&lt;/th&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;20.2%&lt;/td&gt;
&lt;td&gt;23.6%&lt;/td&gt;
&lt;td&gt;10–50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;21.9%&lt;/td&gt;
&lt;td&gt;24.3%&lt;/td&gt;
&lt;td&gt;10–50%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer&lt;/td&gt;
&lt;td&gt;21.9%&lt;/td&gt;
&lt;td&gt;24.3%&lt;/td&gt;
&lt;td&gt;10–50%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Verbose prompts (filler words, hedging language) compress 30–36%. Technical system prompts compress less (10–15%) because they're already dense. Short prompts can hit 40–50% but the absolute saving is smaller.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;100% fact preservation&lt;/strong&gt; across all 41 runs — numbers, file paths, named entities came through unchanged every time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Combined savings: a real example
&lt;/h2&gt;

&lt;p&gt;Agent setup: 10 tools, 2,000 calls/day, average 800-token system prompt, 5-turn sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Promptolian:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool schemas: 1,200 tok × 2,000 = 2.4M tok/day&lt;/li&gt;
&lt;li&gt;System prompt: 800 tok × 2,000 = 1.6M tok/day&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: 4M tok/day = ~$360/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With Promptolian (session avg):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool schemas: ~84 tok × 2,000 = 168k tok/day (93% saved)&lt;/li&gt;
&lt;li&gt;System prompt: ~620 tok × 2,000 = 1.24M tok/day (22% saved)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: 1.41M tok/day = ~$127/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monthly saving: ~$233. Annual: ~$2,800.&lt;/strong&gt; On a $19/month tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Install
&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;promptolian&lt;/span&gt;

&lt;span class="c1"&gt;# One line to compress every Anthropic call
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;promptolian&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;patch_anthropic&lt;/span&gt;
&lt;span class="nf"&gt;patch_anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Your existing code unchanged
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert Python developer...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# compressed automatically
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Check savings
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;promptolian&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_stats&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_stats&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;# → 47 calls · 18,432 tok saved · 22.1% CR
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Claude Code users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;promptolian mcp &lt;span class="nb"&gt;install&lt;/span&gt;   &lt;span class="c"&gt;# adds to ~/.claude/settings.json&lt;/span&gt;
&lt;span class="c"&gt;# restart Claude Code — done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tool schema compression via the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://api.promptolian.com/compress-tools &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"tools": [...], "session_id": "my-session-1"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What we didn't solve
&lt;/h2&gt;

&lt;p&gt;Being honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The 3-token trick only works when definitions are in context.&lt;/strong&gt; If you're running very long sessions where old turns get truncated, turn-2+ savings shrink.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt compression is rule-based&lt;/strong&gt;, not neural. It works well on verbose/instructional text. Technical dense text compresses less.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No OpenAI Responses API support yet&lt;/strong&gt; — just the Chat Completions endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The self-hosted server requires the repo&lt;/strong&gt; — pip install alone doesn't bundle the full engine yet.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open questions we'd love feedback on
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What's your typical tool count per agent?&lt;/li&gt;
&lt;li&gt;Do you use prompt caching today? Does it actually hit in practice?&lt;/li&gt;
&lt;li&gt;Would you pay for usage-based pricing (per token saved) vs flat monthly?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The full benchmark methodology and raw data are at &lt;a href="https://promptolian.com/benchmarks" rel="noopener noreferrer"&gt;promptolian.com/benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/Maurizio-L/promptolian-public" rel="noopener noreferrer"&gt;github.com/Maurizio-L/promptolian-public&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Maurizio Lospi — &lt;a href="mailto:maurizio.lospi@gmail.com"&gt;maurizio.lospi@gmail.com&lt;/a&gt;. Feedback welcome — especially if your numbers look different from mine.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
