<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Wojciech Wentland</title>
    <description>The latest articles on DEV Community by Wojciech Wentland (@desty2k).</description>
    <link>https://dev.to/desty2k</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866520%2F3ad33efb-4ece-434a-97e5-e9cecc7cc9fd.png</url>
      <title>DEV Community: Wojciech Wentland</title>
      <link>https://dev.to/desty2k</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/desty2k"/>
    <language>en</language>
    <item>
      <title>I built a read-only MCP server for Akamai</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Wed, 29 Apr 2026 06:00:00 +0000</pubDate>
      <link>https://dev.to/desty2k/i-built-a-read-only-mcp-server-for-akamai-3398</link>
      <guid>https://dev.to/desty2k/i-built-a-read-only-mcp-server-for-akamai-3398</guid>
      <description>&lt;p&gt;I had 200+ CDN properties in Akamai and an agent that couldn't find any of them. Akamai's &lt;a href="https://techdocs.akamai.com/property-mgr/reference/get-properties" rel="noopener noreferrer"&gt;Property Manager API&lt;/a&gt; lists properties by group and contract, but there's no fuzzy search endpoint. If the agent doesn't know the exact property name or ID, it's stuck. The conversation dead-ends with "I couldn't find that property" and the user goes back to the Akamai control panel.&lt;/p&gt;

&lt;p&gt;So I built an &lt;a href="https://github.com/desty2k/readonly-mcp-akamai" rel="noopener noreferrer"&gt;MCP server&lt;/a&gt; that wraps Akamai's APIs. 16 tools for searching properties, browsing EdgeWorker code, querying DNS zones, inspecting network lists, and translating error codes. All read-only. I wrote about &lt;a href="https://blog.wentland.io/blog/why-read-only-mcp/" rel="noopener noreferrer"&gt;why I only build read-only MCP servers&lt;/a&gt; separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Property search with a preloaded index
&lt;/h2&gt;

&lt;p&gt;Akamai organizes properties under groups and contracts. To search across all of them through the API, you'd iterate every group-contract pair and list properties one by one. Slow, and no fuzzy matching.&lt;/p&gt;

&lt;p&gt;The server preloads every property into an in-memory index at startup. It fans out API calls across all group-contract pairs in parallel, deduplicates, and builds a list of names. &lt;a href="https://github.com/rapidfuzz/rapidfuzz" rel="noopener noreferrer"&gt;rapidfuzz&lt;/a&gt; handles the matching with &lt;code&gt;WRatio&lt;/code&gt; as the scorer. &lt;code&gt;WRatio&lt;/code&gt; tries multiple comparison strategies (ratio, partial ratio, token sort, token set) and picks the best one, weighted by string length differences. Slower than a simple ratio, but it means "checkout config" matches "checkout.example.com - Production" without the agent needing to know the exact naming convention.&lt;/p&gt;

&lt;p&gt;On a real account with 95 groups and 263 properties, the index loads in about 3 seconds. After that, searches hit memory with zero API calls.&lt;/p&gt;

&lt;p&gt;One thing I hit early: fanning out 95 concurrent requests without any throttling. Akamai's PAPI has rate limits, and a burst that size at startup can trigger 429s. The server caps concurrency with a semaphore, 10 requests at a time. Still fast enough, no rejected requests.&lt;/p&gt;

&lt;p&gt;The index refreshes every 5 minutes in a background task. I described this pattern in &lt;a href="https://blog.wentland.io/blog/mcp-servers-are-not-api-adapters/" rel="noopener noreferrer"&gt;Your MCP server is not an API adapter&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  EdgeWorker code browsing
&lt;/h2&gt;

&lt;p&gt;Akamai &lt;a href="https://techdocs.akamai.com/edgeworkers/docs/create-a-code-bundle" rel="noopener noreferrer"&gt;EdgeWorkers&lt;/a&gt; are serverless functions that run on CDN edge nodes. The code is stored as tgz archives containing &lt;code&gt;main.js&lt;/code&gt;, &lt;code&gt;bundle.json&lt;/code&gt;, and supporting files. To read a file, you download the archive, extract it, and find what you need. Doing that on every tool call would be slow.&lt;/p&gt;

&lt;p&gt;The server downloads the bundle once, extracts all files into memory, and caches them with &lt;a href="https://cachetools.readthedocs.io/" rel="noopener noreferrer"&gt;cachetools.TTLCache&lt;/a&gt;. 1-hour TTL, max 50 entries. After the first download, the agent can list files, read by line range, and search with regex. No repeat downloads.&lt;/p&gt;

&lt;p&gt;When the agent asks "what does the main.js of EdgeWorker X look like?", the first call takes a second or two. Follow-up questions like "search for the routing logic" or "show me lines 50-80" are instant.&lt;/p&gt;

&lt;p&gt;I considered caching to disk, but these bundles are small (usually under 100KB). Keeping them in memory avoids filesystem management and the cache evicts automatically when TTL expires or the LRU limit hits. The tradeoff is bundles disappear on restart, but the reload is cheap enough that it doesn't matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Response shaping
&lt;/h2&gt;

&lt;p&gt;Akamai property rule trees can be hundreds of KB. A typical production property has nested rules with behaviors, criteria, and options. Sending the full JSON wastes context.&lt;/p&gt;

&lt;p&gt;The server strips the rule tree before returning it. Keeps rule names, match criteria, behavior configs, and the recursive structure. Removes template UUIDs, format versions, and other internal metadata the agent doesn't need. Property details, activations, and DNS records get the same treatment.&lt;/p&gt;

&lt;p&gt;This is more aggressive than just dropping null fields. The raw rule tree has UUIDs on every node, template links, criteria satisfaction mode flags, locked indicators. None of that helps an agent answer "what caching rules are set for this property?" Stripping it cuts the response to maybe a third of the original size.&lt;/p&gt;

&lt;h2&gt;
  
  
  EdgeGrid auth from scratch
&lt;/h2&gt;

&lt;p&gt;Akamai uses &lt;a href="https://techdocs.akamai.com/developer/docs/make-your-first-api-call" rel="noopener noreferrer"&gt;EdgeGrid&lt;/a&gt; for API authentication. There's an official &lt;code&gt;edgegrid-python&lt;/code&gt; library, but it wraps &lt;code&gt;requests&lt;/code&gt; (sync). I wanted &lt;code&gt;httpx&lt;/code&gt; (async) with connection pooling, so the server implements EdgeGrid signing directly: HMAC-SHA256 over a canonical request string, base64-encoded, attached as an Authorization header. About 40 lines.&lt;/p&gt;

&lt;p&gt;The signing is straightforward from the public spec. The annoying part is that the query string must be included in the signed data, so you have to build the full URL with parameters before signing, then make the request with that same URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the agent can do
&lt;/h2&gt;

&lt;p&gt;With 16 read-only tools, an agent can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Which CDN property handles checkout.example.com?"&lt;/li&gt;
&lt;li&gt;"What caching rules are configured for the API property?"&lt;/li&gt;
&lt;li&gt;"Show me the main.js from the latest EdgeWorker version"&lt;/li&gt;
&lt;li&gt;"Search the EdgeWorker code for references to the auth header"&lt;/li&gt;
&lt;li&gt;"What DNS records exist for example.com?"&lt;/li&gt;
&lt;li&gt;"Which IPs are in the production allowlist?"&lt;/li&gt;
&lt;li&gt;"What does Akamai error code 9.6f64d440.1318965461.2f2b078 mean?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Right now these all require the Akamai control panel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Add to Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add akamai &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AKAMAI_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-host.akamaiapis.net &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AKAMAI_CLIENT_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;akab-xxx &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AKAMAI_CLIENT_SECRET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;xxx &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;AKAMAI_ACCESS_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;akab-xxx &lt;span class="nt"&gt;--&lt;/span&gt; uvx readonly-mcp-akamai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a read-only API credential in Akamai's &lt;a href="https://techdocs.akamai.com/developer/docs/set-up-authentication-credentials" rel="noopener noreferrer"&gt;Identity and Access Management&lt;/a&gt; panel. Source and docs: &lt;a href="https://github.com/desty2k/readonly-mcp-akamai" rel="noopener noreferrer"&gt;readonly-mcp-akamai on GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>python</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AI coding agents compressed the feedback loop from hours to seconds. I wrote about why that compression looks a lot like the variable-reward patterns behind slot machines and social media.</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Mon, 13 Apr 2026 17:12:41 +0000</pubDate>
      <link>https://dev.to/desty2k/ai-coding-agents-compressed-the-feedback-loop-from-hours-to-seconds-i-wrote-about-why-that-dbi</link>
      <guid>https://dev.to/desty2k/ai-coding-agents-compressed-the-feedback-loop-from-hours-to-seconds-i-wrote-about-why-that-dbi</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h" class="crayons-story__hidden-navigation-link"&gt;Your coding agent is a slot machine&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/desty2k" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866520%2F3ad33efb-4ece-434a-97e5-e9cecc7cc9fd.png" alt="desty2k profile" class="crayons-avatar__image" width="420" height="420"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/desty2k" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Wojciech Wentland
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Wojciech Wentland
                
              
              &lt;div id="story-author-preview-content-3493352" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/desty2k" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866520%2F3ad33efb-4ece-434a-97e5-e9cecc7cc9fd.png" class="crayons-avatar__image" alt="" width="420" height="420"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Wojciech Wentland&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 13&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h" id="article-link-3493352"&gt;
          Your coding agent is a slot machine
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/productivity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;productivity&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/psychology"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;psychology&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="24" height="24"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            4 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Your coding agent is a slot machine</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Mon, 13 Apr 2026 08:16:25 +0000</pubDate>
      <link>https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h</link>
      <guid>https://dev.to/desty2k/your-coding-agent-is-a-slot-machine-1a2h</guid>
      <description>&lt;p&gt;Programming used to have a speed limit. You wrote code, fought the compiler, debugged, tested, and eventually deployed. The hit of satisfaction came when the feature shipped or the tests went green. That cycle took hours. Sometimes days. The delay regulated behavior.&lt;/p&gt;

&lt;p&gt;You couldn't binge on deploy-satisfaction because the loop was too slow. The friction was structural. Nobody pulled 14-hour days writing Java servlets because the reward came too fast. If anything, people quit too early because it took too long.&lt;/p&gt;

&lt;p&gt;AI coding agents removed the speed limit. I'm using "slot machine" as a behavioral metaphor here, not a clinical diagnosis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thirty seconds
&lt;/h2&gt;

&lt;p&gt;Prompt. Result. Satisfaction. Next prompt. The entire arc of "I had an idea, I built it, I saw it work" now fits inside 30 seconds. And it repeats. Indefinitely.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://gamblingresearch.sites.olt.ubc.ca/files/2023/03/ClarkZack_2023postprint_RewardVariability.pdf" rel="noopener noreferrer"&gt;2023 paper in &lt;em&gt;Addictive Behaviors&lt;/em&gt;&lt;/a&gt; by Clark and Zack lays out why this matters. Two factors make non-drug activities addictive: reward variability (you don't know exactly what you'll get) and frequency (how many reward cycles you can fit into a unit of time). They call the second one temporal compression. Social media has both. Loot boxes have both. Slot machines are the textbook case.&lt;/p&gt;

&lt;p&gt;AI coding agents have both. Each prompt returns something slightly different (variability). And you can run dozens of cycles per hour (compression). The paper's conclusion is blunt: "By enabling near limitless diversity and speed of delivery of non-drug rewards, digital technology has permitted engineering of reinforcers with addictive potential that, delivered under natural conditions, would likely never become addictive."&lt;/p&gt;

&lt;p&gt;They were writing about gambling and social media. They could have been writing about your terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Eight tabs, eight reward streams
&lt;/h2&gt;

&lt;p&gt;Running multiple agent sessions in parallel looks like multitasking. It's actually multiple concurrent reward streams. A &lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7910435/" rel="noopener noreferrer"&gt;study analyzing over a million social media posts&lt;/a&gt; found that people adjust their posting frequency to maximize the rate of likes they receive, the same way animals in a Skinner box adjust lever presses to maximize food pellets. Agent tabs work on the same principle. Every switch carries a chance that something finished. A small hit.&lt;/p&gt;

&lt;p&gt;The sessions don't compete for your attention. They take turns feeding it.&lt;/p&gt;

&lt;p&gt;I have about twenty terminal tabs open right now. Some are from days ago, left around because I'll probably get back to them. Five or six are active. One is pulling together an infrastructure report, one is waiting on CI after fixes it wrote itself, one is something I opened mid-sentence to chase a bug that came up while I was reviewing something else. While writing this paragraph I switched tabs twice to check if a build passed. It doesn't feel like a problem while it's happening.&lt;/p&gt;

&lt;p&gt;I keep seeing people call this "an assembly line of productivity." An assembly line runs whether you're paying attention or not. You just keep feeding it. Nobody describes their relationship with a useful tool that way.&lt;/p&gt;

&lt;p&gt;Parallel sessions and rapid context switching get marketed as productivity features. Wanting novelty, trying things quickly, jumping between five terminals. But the behavior this produces looks a lot like the variable-reward patterns behind checking your phone 80 times a day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The meta-tool trap
&lt;/h2&gt;

&lt;p&gt;People are building elaborate systems to manage the chaos of their agent-assisted workflow. Productivity hubs with skill trees. XP points. Urgency scores. Daily summaries that rate your day out of 10. Automatic task splitting when you miss a deadline.&lt;/p&gt;

&lt;p&gt;They gamified the thing that was already acting like a game.&lt;/p&gt;

&lt;p&gt;The meta-work around the compulsive work becomes its own loop. Hit from the agent completing a task. Hit from watching the XP bar move. Hit from the daily score. And then you need a system to manage &lt;em&gt;that&lt;/em&gt; system, and at some point you're four layers deep in productivity tooling and haven't shipped anything in a week.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting is the drug
&lt;/h2&gt;

&lt;p&gt;The first hour of a new project has the highest novelty density. Agents make that phase unusually cheap. Everything is possible, nothing is broken yet, and the code just keeps coming.&lt;/p&gt;

&lt;p&gt;Finishing is edge cases. Tests for the boring paths. The last 20% that takes 80% of the time. Reward density drops off a cliff. So you open a new tab and start something else.&lt;/p&gt;

&lt;p&gt;You see the pattern everywhere once you look. Bursts of intense output followed by nothing. Somebody produces a hundred pieces of content in six weeks, then drops to zero. Twenty projects in various stages of beginning, none shipped.&lt;/p&gt;

&lt;p&gt;And building the thing is the easy part. Finding users, handling support, keeping the service running at 3am, writing docs nobody reads, negotiating contracts, doing the marketing that actually brings people in. None of that gives you a dopamine hit, and none of it fits in a 30-second prompt cycle. The agent can scaffold an app in an afternoon. It can't make anyone care about it. The work that makes a product real is exactly the work that the reward loop skips over.&lt;/p&gt;

&lt;p&gt;The behavior tracks novelty, not value. And since starting costs almost nothing now, you run out of interest before you run out of ideas.&lt;/p&gt;

&lt;h2&gt;
  
  
  The accidental guardrail
&lt;/h2&gt;

&lt;p&gt;The most revealing thing about these tools is the feature nobody asked for: usage limits.&lt;/p&gt;

&lt;p&gt;Usage caps end up acting as a hard stop. That's a strange role for a productivity tool, but it's clearly how some people experience these products. Claude's own &lt;a href="https://support.anthropic.com/en/articles/9797557-usage-limit-best-practices" rel="noopener noreferrer"&gt;usage limit docs&lt;/a&gt; discuss how to work within the caps. Some users describe the cap running out as the thing that makes them stop for the night. In a &lt;a href="https://www.reddit.com/r/ClaudeAI/comments/1shciwa/any_other_adhd_programmers_find_claudecode_to_be/" rel="noopener noreferrer"&gt;r/ClaudeAI thread&lt;/a&gt;, one person described deliberately downgrading their plan so it would expire before midnight. Recognized the pattern, reintroduced friction on purpose.&lt;/p&gt;

&lt;p&gt;A productivity tool where the most effective safety feature is running out of capacity.&lt;/p&gt;

&lt;p&gt;Your text editor doesn't need a cooldown timer. Nobody ships an IDE with "take a break" reminders. Productivity tools don't usually come with harm reduction features.&lt;/p&gt;

&lt;p&gt;The token limit is a circuit breaker. Most of the discourse around it is people asking how to get rid of it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
      <category>psychology</category>
    </item>
    <item>
      <title>Why I only build read-only MCP servers</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:43:42 +0000</pubDate>
      <link>https://dev.to/desty2k/why-i-only-build-read-only-mcp-servers-32kl</link>
      <guid>https://dev.to/desty2k/why-i-only-build-read-only-mcp-servers-32kl</guid>
      <description>&lt;p&gt;Every MCP server I build is read-only. List, search, get, read. No create, update, delete, activate, purge.&lt;/p&gt;

&lt;p&gt;I've been running Claude Code with &lt;a href="https://code.claude.com/docs/en/permission-modes" rel="noopener noreferrer"&gt;&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;&lt;/a&gt; in environments where the agent has no write-capable MCP tools and no direct path to mutate production systems. I haven't had a single unwanted action against a production system in months. Not because I trust the model to never hallucinate. Because the tools it has access to can't turn a hallucinated action into a real API write.&lt;/p&gt;

&lt;p&gt;Read-only doesn't make an agent safe. It removes an entire class of failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode isn't hypothetical
&lt;/h2&gt;

&lt;p&gt;There's a &lt;a href="https://www.reddit.com/r/ClaudeCode/comments/1sex28q/opus_46_destroys_a_users_session_costing_them/" rel="noopener noreferrer"&gt;post on r/ClaudeCode&lt;/a&gt; where Claude suggested tearing down a GPU instance, then executed it. The user never confirmed. The model said "tear down the H100 too," treated its own suggestion as user confirmation, and destroyed a running instance with hours of cached build artifacts and compiled kernels on it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbgm60xs9ixdsdo019m8.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbgm60xs9ixdsdo019m8.webp" alt="Claude hallucinated user confirmation and destroyed a running GPU instance. Source: r/ClaudeCode" width="800" height="739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model later admitted: "I hallucinated you saying that. You never said those words. I said it, then executed it as if you'd agreed."&lt;/p&gt;

&lt;p&gt;If that agent had read-only tools, it would have read the instance list, maybe suggested tearing something down, and then... nothing. The suggestion dies as text. No one loses a machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I actually use agents
&lt;/h2&gt;

&lt;p&gt;My workflow with Claude Code looks like this: I ask it to investigate something. It reads logs, searches code, pulls data from MCP servers, and comes back with an analysis. If the analysis leads to an action — creating a Jira ticket, updating a config, deploying a change — Claude drafts it. I review the draft, then I do the action myself.&lt;/p&gt;

&lt;p&gt;The agent reads and analyzes. I act.&lt;/p&gt;

&lt;p&gt;I trust the model's judgment on what to write in a ticket. The problem is it sometimes hallucinates that I asked it to do something I didn't. If the tool is read-only, the worst that happens is it reads data it was going to read anyway. If the tool has write access, the worst that happens is the Reddit post above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approval fatigue is the real problem
&lt;/h2&gt;

&lt;p&gt;"But there's a confirmation prompt before destructive actions." Sure. Claude Code asks before running commands. The problem is approval fatigue. After confirming 50 read operations, you stop reading the prompts. You click yes. And then the 51st one is &lt;code&gt;vastai destroy instance 34122719&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Anthropic wrote about this in their &lt;a href="https://www.anthropic.com/engineering/claude-code-sandboxing" rel="noopener noreferrer"&gt;sandboxing post&lt;/a&gt;. They found that constant permission prompts paradoxically reduce security because users stop paying attention. Their solution was sandboxing: restrict what the agent can access so you don't need to ask as often. They reduced permission prompts by 84% while maintaining security.&lt;/p&gt;

&lt;p&gt;Read-only MCP servers follow the same logic. If the server can't write, you don't need to confirm writes. The agent operates freely within the read boundary. No fatigue, no missed confirmation on a destructive action.&lt;/p&gt;

&lt;p&gt;That's why I run &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt;. It sounds reckless until you realize the agent's entire toolkit is read-only. There's nothing dangerous to skip permission for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this doesn't cover
&lt;/h2&gt;

&lt;p&gt;Read-only MCP servers are one boundary, not a complete agent security model. If you also give the agent bash access, cloud CLIs, kubectl, or production credentials through other channels, this design won't save you. Claude Code with &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; can still run shell commands, edit files, and interact with whatever's reachable from the host. Anthropic's own documentation &lt;a href="https://code.claude.com/docs/en/permission-modes" rel="noopener noreferrer"&gt;recommends&lt;/a&gt; using isolated environments when running in bypass mode, and their sandboxing approach combines filesystem isolation, network restrictions, and permission controls — not just tool-level restrictions.&lt;/p&gt;

&lt;p&gt;This article is about the MCP boundary specifically. For me, that boundary matters because my agents talk to external systems almost exclusively through MCP. But it's one layer, not the whole stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the IDE
&lt;/h2&gt;

&lt;p&gt;There's another reason I care about read-only MCP servers: they're portable. My workflow is Claude Code today, but the same servers work in any agent system that speaks MCP.&lt;/p&gt;

&lt;p&gt;In a headless agent system — one where there's no human in the loop and no bash shell — the MCP boundary isn't just one layer. It's the only interface the agent has to external systems. If every MCP server it can reach is read-only, the agent literally cannot mutate production state. No sandboxing needed, no permission prompts, no approval fatigue. The tools themselves are the guardrail.&lt;/p&gt;

&lt;p&gt;This matters if you're building agent systems for other users. Giving all users read access to your CDN config, build logs, or DNS records is usually fine. Giving all users write access is a different conversation entirely. Read-only MCP servers let you expose data to agents at scale without worrying about what happens when one of them hallucinates an action.&lt;/p&gt;

&lt;h2&gt;
  
  
  What read-only servers are good for
&lt;/h2&gt;

&lt;p&gt;I run MCP servers for CDN management, CI/CD, log aggregation, DNS, and incident management. All read-only. The questions I ask look like: "What's the current CDN config for checkout?" "Which build failed last night?" "Compare caching rules between production and staging." "Draft a Jira ticket for the DNS change we discussed."&lt;/p&gt;

&lt;p&gt;Claude produces the draft text. I copy it into Jira or GitHub myself. Nothing in this workflow needs the agent to write to the target system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The credential argument
&lt;/h2&gt;

&lt;p&gt;Getting a read-only API credential approved is a conversation. "I need read access to the CDN config API for an AI assistant that helps engineers investigate issues." Most teams say yes.&lt;/p&gt;

&lt;p&gt;Getting a write credential is different. "I need an AI agent to be able to modify CDN configurations." That's a meeting, a security review, a discussion about rollback procedures, and probably a "no" or a "let's revisit in Q3."&lt;/p&gt;

&lt;p&gt;Read-only credentials have a smaller blast radius and a simpler approval process. They also happen to cover every use case I actually have.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for MCP servers
&lt;/h2&gt;

&lt;p&gt;Every MCP server I publish follows this: read-only by design. The &lt;a href="https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices" rel="noopener noreferrer"&gt;MCP security best practices&lt;/a&gt; describe scope minimization as a core principle. Start with the minimum privileges, elevate only when required. My servers don't elevate.&lt;/p&gt;

&lt;p&gt;If someone opens a GitHub issue asking for write tools, the answer is: "This server is intentionally read-only. Fork it if you need write operations." That's not laziness. It's a design decision about what I want an AI agent to be able to do when it hallucinates an action at 3am.&lt;/p&gt;

&lt;p&gt;I'm planning a series of production-ready read-only MCP servers for various platforms. More on that soon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Your MCP server is not an API adapter</title>
      <dc:creator>Wojciech Wentland</dc:creator>
      <pubDate>Wed, 08 Apr 2026 19:59:13 +0000</pubDate>
      <link>https://dev.to/desty2k/your-mcp-server-is-not-an-api-adapter-23k7</link>
      <guid>https://dev.to/desty2k/your-mcp-server-is-not-an-api-adapter-23k7</guid>
      <description>&lt;p&gt;A lot of &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP&lt;/a&gt; servers I see in the wild look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_thing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.example.com/things/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fetch, forward, done. A thin HTTP proxy with a JSON Schema wrapper. For some&lt;br&gt;
use cases, that's enough.&lt;/p&gt;

&lt;p&gt;The servers I keep coming back to do something different. They hold state and&lt;br&gt;
pre-compute answers. An agent hitting a thin wrapper might need three round&lt;br&gt;
trips and 30 seconds. The same agent hitting a server that does real work gets&lt;br&gt;
its answer in one call, under a millisecond.&lt;/p&gt;
&lt;h2&gt;
  
  
  Preloaded in-memory index
&lt;/h2&gt;

&lt;p&gt;Here's a failure mode I run into constantly: the agent needs to find something&lt;br&gt;
but doesn't know the exact ID. Most APIs only support exact lookups. No ID, no&lt;br&gt;
result. The conversation dead-ends with "I couldn't find that resource" and the&lt;br&gt;
user gives up.&lt;/p&gt;

&lt;p&gt;I built a server that wraps a CDN management API. Hundreds of properties, and&lt;br&gt;
the agent regularly needs to find which one handles a given hostname. The API&lt;br&gt;
has a search endpoint, but it's slow, requires exact matches, and sometimes&lt;br&gt;
returns 403 depending on account permissions.&lt;/p&gt;

&lt;p&gt;So the server loads every property into memory at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PropertyIndex&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;PropertyEntry&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;_name_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;refresh_interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_refresh_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_refresh_loop&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_by_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;property_name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scorer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fuzz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WRatio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score_cutoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_entries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Builds once by fanning out parallel API calls, deduplicates, refreshes every&lt;br&gt;
five minutes in the background. Lookups take under a millisecond.&lt;/p&gt;

&lt;p&gt;Without this, the agent guesses at exact property names, picks the wrong one,&lt;br&gt;
retries, burns three turns. With the index, someone types "the CDN config for&lt;br&gt;
checkout" and gets the right answer first try. That's the kind of difference&lt;br&gt;
that decides whether people keep using the agent or go back to doing it&lt;br&gt;
manually.&lt;/p&gt;

&lt;p&gt;I did the same thing for a CI/CD server. The API lets you fetch a build config&lt;br&gt;
by ID, but there's no fuzzy search. If you don't know the ID, you're stuck. The&lt;br&gt;
server caches all build configurations at startup, runs fuzzy matching against&lt;br&gt;
them. The agent says "find the deploy job for the payments service" and gets a&lt;br&gt;
ranked list instantly, even though the CI system itself can't do that.&lt;/p&gt;
&lt;h2&gt;
  
  
  Embedded analytical database
&lt;/h2&gt;

&lt;p&gt;I have another server that sits in front of a relational database. Some tables&lt;br&gt;
have 20 million rows. The agent needs to answer analytical questions, things&lt;br&gt;
like "which providers have the highest volume in this region?" or "show me the&lt;br&gt;
top performers for a given category."&lt;/p&gt;

&lt;p&gt;The database wasn't designed for these queries. It was built for a web UI with&lt;br&gt;
narrow, well-indexed lookups. The agent's access patterns are different: it asks&lt;br&gt;
broad analytical questions that require joins across tables the application&lt;br&gt;
never joins. Adding indexes wasn't an option either, because the database is&lt;br&gt;
owned by another team and optimizing it for an AI agent's query patterns wasn't&lt;br&gt;
on anyone's roadmap. Some of these queries took 10 to 30 seconds on a read&lt;br&gt;
replica, and in an agent loop where that latency gets multiplied by however many&lt;br&gt;
tool calls the agent needs, the conversation times out before it gets anywhere.&lt;/p&gt;

&lt;p&gt;The server embeds &lt;a href="https://duckdb.org" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt; in-process and loads pre-aggregated views and lookup&lt;br&gt;
tables at startup. Some are straight copies of small reference tables. Others&lt;br&gt;
are materialized summaries that flatten joins the source database was never&lt;br&gt;
designed to run efficiently, the kind of cross-table aggregations that make&lt;br&gt;
sense for an analytical question but would be expensive on a schema built for&lt;br&gt;
transactional web UI lookups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DuckDBCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fast_configs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_ready&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_deferred_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_load_deferred&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deferred_configs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_refresh_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_refresh_loop&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each table has a fingerprint query (a cheap &lt;code&gt;COUNT(*)&lt;/code&gt; or checksum) that the&lt;br&gt;
refresh loop checks before doing a full reload. Large tables load in the&lt;br&gt;
background after the server is already taking requests. If something asks for a&lt;br&gt;
table that hasn't loaded yet, it falls back to the source database.&lt;/p&gt;

&lt;p&gt;The 30-second query now takes under a millisecond. The agent can actually have a&lt;br&gt;
back-and-forth with the user instead of timing out after the first question.&lt;/p&gt;

&lt;p&gt;There's a query-result cache on top of this too. It has a prewarm manifest,&lt;br&gt;
basically a list of common queries that run at startup so the first person to&lt;br&gt;
use the agent on Monday morning doesn't sit through a cold start.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QueryCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_or_compute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compute_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cached&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;compute_fn&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_default_ttl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It skips caching error responses. If a query fails because the database is&lt;br&gt;
temporarily overloaded, you don't want that failure served for the next hour.&lt;br&gt;
That one took a production outage to figure out.&lt;/p&gt;
&lt;h2&gt;
  
  
  Data transformation
&lt;/h2&gt;

&lt;p&gt;Every server I build strips the upstream API response before returning it.&lt;br&gt;
Token usage scales with response size, and most APIs return 10x more data than&lt;br&gt;
the agent will ever look at.&lt;/p&gt;

&lt;p&gt;One API I work with returns objects with 60+ fields. The server keeps maybe 8:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_slim_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_strip_nulls&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_cents_to_major&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_value_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_cents_to_major&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;annual_value_cents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_effective_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;_cents_to_major&lt;/code&gt; converts cents to dollars. The raw API stores monetary values&lt;br&gt;
in cents. Before I added this conversion, 100% of the reports the agent&lt;br&gt;
generated had wrong numbers. Every dollar amount was off by a factor of 100. A $2,000 contract showed up as&lt;br&gt;
$200,000 in the report because the agent treated cents as dollars. No amount of prompt engineering fixed it reliably. Moving the conversion&lt;br&gt;
into the server did.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;_effective_status&lt;/code&gt; is the other one worth mentioning. The API's status field&lt;br&gt;
can say "active" on a record that ended three months ago. The platform's own UI&lt;br&gt;
derives the real status from multiple fields, so the MCP server does the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_effective_status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;stage&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;terminated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;not_renewed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inactive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date_not_applicable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;renewal_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;perpetual&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;undetermined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;end_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inactive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;undetermined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent gives the same answer a human would get looking at the UI.&lt;br&gt;
Stripping nulls across a list of 50 records also saves a few thousand tokens&lt;br&gt;
per response, which adds up.&lt;/p&gt;

&lt;p&gt;A log aggregation server I built does something similar: auto-appends&lt;br&gt;
&lt;code&gt;| json auto&lt;/code&gt; to queries that don't have a field extraction operator, truncates&lt;br&gt;
raw log lines to 500 characters, converts epoch-millisecond timestamps to&lt;br&gt;
ISO 8601. Small fixes that add up to the agent not wasting turns fighting the&lt;br&gt;
format.&lt;/p&gt;
&lt;h2&gt;
  
  
  Download once, serve from cache
&lt;/h2&gt;

&lt;p&gt;Some data is expensive to fetch. PDF documents. Code bundles in tgz archives.&lt;br&gt;
The pattern: download on first access, extract the text, build a line offset&lt;br&gt;
index, serve everything from memory after that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CachedFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;offsets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;
            &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;offsets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_offsets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;offsets&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_lines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_offsets&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_line_end_offset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use this for CDN edge function code bundles and PDF documents (extracted with&lt;br&gt;
&lt;a href="https://pymupdf.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;PyMuPDF&lt;/a&gt;). After the first download, the agent reads by line range, searches&lt;br&gt;
with regex, lists the file tree. No repeat downloads. Reading through a&lt;br&gt;
200-page document becomes "just read" instead of "download, extract, read" on&lt;br&gt;
every question.&lt;/p&gt;

&lt;h2&gt;
  
  
  When thin is fine
&lt;/h2&gt;

&lt;p&gt;Not everything needs this treatment. A server that translates natural language&lt;br&gt;
to a query language and passes it to an API is fine as a thin wrapper. The&lt;br&gt;
translation is the value there. Same for simple lookup tools.&lt;/p&gt;

&lt;p&gt;The question I ask: does the agent hit the same data twice? Does the API return&lt;br&gt;
more than the agent needs? Is the API response time slow enough that the agent&lt;br&gt;
loop feels broken? If yes, the server should be doing work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The multiplier
&lt;/h2&gt;

&lt;p&gt;When a person uses a web UI, they look at a page, think, click something else.&lt;br&gt;
One request at a time, processed by a human brain. An agent works differently.&lt;br&gt;
It makes five tool calls, stuffs all five responses into its context window, and&lt;br&gt;
reasons over them at once. A slow response gets multiplied by every call. A&lt;br&gt;
60-field JSON blob gets multiplied by every call. It adds up fast.&lt;/p&gt;

&lt;p&gt;I've measured the difference. CDN property lookups went from three agent turns&lt;br&gt;
to one once the fuzzy index was in place. Analytical queries went from timing&lt;br&gt;
out at 30 seconds to returning in under a millisecond from DuckDB. And every&lt;br&gt;
single dollar amount in every report was wrong until the server started&lt;br&gt;
converting cents for the agent.&lt;/p&gt;

&lt;p&gt;You can try to fix that last one with prompt engineering. I tried for weeks. The agent still got it wrong often enough that I couldn't trust the output.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
