<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stanislav Tsepa</title>
    <description>The latest articles on DEV Community by Stanislav Tsepa (@an0nymus).</description>
    <link>https://dev.to/an0nymus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3787982%2Ff300ce56-4ded-4daa-86e4-41d908f7eba6.png</url>
      <title>DEV Community: Stanislav Tsepa</title>
      <link>https://dev.to/an0nymus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/an0nymus"/>
    <language>en</language>
    <item>
      <title>Stop building reactive agents: Why your architecture needs a System 1 and System 2</title>
      <dc:creator>Stanislav Tsepa</dc:creator>
      <pubDate>Thu, 26 Feb 2026 19:35:41 +0000</pubDate>
      <link>https://dev.to/an0nymus/stop-building-reactive-agents-why-your-architecture-needs-a-system-1-and-system-2-4b6p</link>
      <guid>https://dev.to/an0nymus/stop-building-reactive-agents-why-your-architecture-needs-a-system-1-and-system-2-4b6p</guid>
      <description>&lt;p&gt;If you’ve built an LLM agent recently, you’ve probably hit the "autonomy wall." &lt;/p&gt;

&lt;p&gt;You give the agent a tool to search the web, a prompt to "be helpful," and a task. For the first two turns, it looks like magic. On turn three, it goes down a Wikipedia rabbit hole. On turn ten, it’s stuck in an infinite loop trying to fix a syntax error on a file it never downloaded.&lt;/p&gt;

&lt;p&gt;Most developers try to fix this by cramming more instructions into the system prompt: &lt;em&gt;"Never repeat the same action twice! Think step-by-step!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But the problem isn’t the prompt. It’s the architecture. &lt;/p&gt;

&lt;p&gt;You are forcing a single execution loop to do two completely different jobs: &lt;strong&gt;talking/acting&lt;/strong&gt; (which requires low latency and high bandwidth) and &lt;strong&gt;planning&lt;/strong&gt; (which requires slow, deliberative reasoning). &lt;/p&gt;

&lt;p&gt;We need to borrow a concept from human psychology—Daniel Kahneman’s &lt;em&gt;Thinking, Fast and Slow&lt;/em&gt;—and build &lt;strong&gt;Dual-Process Agents&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: The Single-Loop Trap
&lt;/h2&gt;

&lt;p&gt;Most standard agents (like a naive ReAct loop) operate in a flat sequence:&lt;br&gt;
&lt;code&gt;Observe -&amp;gt; Think -&amp;gt; Act -&amp;gt; Observe -&amp;gt; Think -&amp;gt; Act&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;When the agent is "thinking," it is trying to decide what to say to the user &lt;em&gt;and&lt;/em&gt; what its long-term strategy should be. Because LLMs are autoregressive, the immediate context (the last thing the user said, or the last API error) overwhelmingly dominates its attention. &lt;/p&gt;

&lt;p&gt;If the agent’s only "planner" is the exact same loop that’s doing the work, you get two failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Shallow Exploration:&lt;/strong&gt; It never discovers new subgoals because it's too focused on the immediate task.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Runaway Exploration:&lt;/strong&gt; It forgets the original goal entirely and never finishes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Dual-Process Solution
&lt;/h2&gt;

&lt;p&gt;A dual-process architecture explicitly separates the "doer" from the "planner."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fyh6rmgnz2une277p52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fyh6rmgnz2une277p52.png" alt="Dual-process agent diagram" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A recent paper out of Stanford (&lt;em&gt;SparkMe&lt;/em&gt;, arXiv:2602.21136) demonstrated this brilliantly in the context of AI conducting qualitative interviews. They split their agent into two distinct systems:&lt;/p&gt;

&lt;h3&gt;
  
  
  System 1: The Executor (Fast)
&lt;/h3&gt;

&lt;p&gt;This is your fast, reactive loop. Its only job is to look at the immediate context and execute the next tactical step. In the interview example, this agent just asks the next question, decides whether to probe deeper into the current topic, or transition to the next one. It does &lt;em&gt;not&lt;/em&gt; worry about the global strategy. &lt;/p&gt;

&lt;h3&gt;
  
  
  System 2: The Planner (Slow &amp;amp; Asynchronous)
&lt;/h3&gt;

&lt;p&gt;This is the deliberative loop. It runs asynchronously in the background (e.g., every &lt;em&gt;k&lt;/em&gt; turns). Its job is to look at the entire history, zoom out, and optimize the overarching trajectory. &lt;/p&gt;

&lt;p&gt;How does it do this? By &lt;strong&gt;simulating rollouts&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Planner takes the current state and spins up hypothetical futures: &lt;em&gt;"If I steer the agent to ask about X, the user might say Y. If I steer it toward Z, the user might say W."&lt;/em&gt; It scores these hypothetical futures against a predefined utility function (e.g., maximizing new information while minimizing token cost). &lt;/p&gt;

&lt;p&gt;Once the Planner finds a high-utility trajectory, it quietly updates the shared "Agenda" that System 1 is reading from.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this changes everything
&lt;/h2&gt;

&lt;p&gt;When you decouple execution from planning, you gain actual control knobs over your agent's autonomy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;How often to plan:&lt;/strong&gt; You can set the Planner to run every 5 steps, saving massive amounts of compute compared to forcing a deep "Chain of Thought" on every single micro-action.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;How far to look ahead:&lt;/strong&gt; You can define the simulation horizon (e.g., look 3 steps into the future).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;What to optimize:&lt;/strong&gt; You can mathematically define what "good" looks like in the Planner's utility function, rather than relying on vibes in a system prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;SparkMe: Adaptive Semi-Structured Interviewing for Qualitative Insight Discovery (arXiv:2602.21136) — &lt;a href="https://arxiv.org/abs/2602.21136" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2602.21136&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Stop trying to build a single "God Prompt" that acts perfectly in the moment while simultaneously playing 4D chess.&lt;/p&gt;

&lt;p&gt;Let your fast agents ship actions. Let your slow agents simulate the future.&lt;/p&gt;

&lt;p&gt;If you like these kinds of architecture notes, follow our Telegram channel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://t.me/the_prompt_and_the_code" rel="noopener noreferrer"&gt;https://t.me/the_prompt_and_the_code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>architecture</category>
      <category>engineering</category>
    </item>
    <item>
      <title>Memory isn’t magic: 3 types of AI 'memory' (and when to use each)</title>
      <dc:creator>Stanislav Tsepa</dc:creator>
      <pubDate>Tue, 24 Feb 2026 23:28:33 +0000</pubDate>
      <link>https://dev.to/an0nymus/memory-isnt-magic-3-types-of-ai-memory-and-when-to-use-each-1e0e</link>
      <guid>https://dev.to/an0nymus/memory-isnt-magic-3-types-of-ai-memory-and-when-to-use-each-1e0e</guid>
      <description>&lt;p&gt;If you’ve ever said “we should add memory to the app,” you’re not alone.&lt;/p&gt;

&lt;p&gt;It’s also the fastest way to start a week-long argument, because &lt;strong&gt;people use the word &lt;em&gt;memory&lt;/em&gt; to mean totally different things&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In practice, “memory” in AI products usually breaks down into &lt;strong&gt;three distinct systems&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;Context (short-term memory)&lt;/strong&gt; — what’s in the current prompt/window&lt;br&gt;
2) &lt;strong&gt;Long‑term memory (facts)&lt;/strong&gt; — durable information you can look up later&lt;br&gt;
3) &lt;strong&gt;Procedural memory (rules + habits)&lt;/strong&gt; — the system that prevents repeating mistakes&lt;/p&gt;

&lt;p&gt;This post is a beginner-friendly map of those three types, with small code snippets and a visual you can copy.&lt;/p&gt;

&lt;p&gt;If you like this kind of build log / agent engineering, I post updates here: &lt;strong&gt;&lt;a href="https://t.me/the_prompt_and_the_code" rel="noopener noreferrer"&gt;https://t.me/the_prompt_and_the_code&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  1) Context: short-term “memory” inside the prompt
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; The text (and other inputs) you send to the model &lt;em&gt;right now&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it’s good for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;continuing a conversation&lt;/li&gt;
&lt;li&gt;keeping a consistent writing style&lt;/li&gt;
&lt;li&gt;doing multi-step tasks (“first do X, then Y”)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What it’s &lt;em&gt;not&lt;/em&gt; good for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remembering things forever&lt;/li&gt;
&lt;li&gt;storing preferences safely&lt;/li&gt;
&lt;li&gt;being accurate at large scale (context gets expensive and messy)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Tiny example (Python)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize this article in 3 bullets.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;article_text&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If it’s not in &lt;code&gt;messages&lt;/code&gt;, the model doesn’t know it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common failure mode:&lt;/strong&gt; “infinite context” thinking.&lt;/p&gt;

&lt;p&gt;If you keep stuffing everything into the prompt, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;rising costs&lt;/li&gt;
&lt;li&gt;slower responses&lt;/li&gt;
&lt;li&gt;more contradictions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; use context to &lt;em&gt;do the work&lt;/em&gt;, not to &lt;em&gt;store the world&lt;/em&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  2) Long-term memory: durable facts you can retrieve
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; Information stored outside the model—files, a database, a vector store, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it’s good for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user preferences (“use shorter replies”, “use metric units”)&lt;/li&gt;
&lt;li&gt;project state (“what repo are we working in?”)&lt;/li&gt;
&lt;li&gt;anything you want to survive restarts&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Minimal pattern: store → retrieve → insert
&lt;/h3&gt;

&lt;p&gt;Here’s a tiny “facts store” example using a JSON file (so you can see the pattern without extra infra):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="n"&gt;MEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_mem&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MEM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;MEM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;save_mem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;MEM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_mem&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="nf"&gt;save_mem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;load_mem&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;preferred_tone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;friendly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write in a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tone&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tone.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; long-term memory is not “the model remembering.” It’s your app doing &lt;strong&gt;retrieval&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The safety angle
&lt;/h3&gt;

&lt;p&gt;Long-term memory can leak. If you store secrets, they can end up back in a prompt.&lt;/p&gt;

&lt;p&gt;A useful mental model from security research: treat external instructions/files as potentially unsafe unless you have checks.&lt;/p&gt;




&lt;h2&gt;
  
  
  3) Procedural memory: the checklist that makes the system reliable
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it is:&lt;/strong&gt; The habits your system follows every time.&lt;/p&gt;

&lt;p&gt;This is the most underrated kind of “memory.” It’s not knowledge—it’s behavior.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Always run tests before merging.”&lt;/li&gt;
&lt;li&gt;“When posting content, dedupe and respect cooldowns.”&lt;/li&gt;
&lt;li&gt;“When a tool call fails, retry with backoff.”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why procedural memory matters
&lt;/h3&gt;

&lt;p&gt;Modern agent systems are vulnerable to “instruction supply chain” problems—where a tool/skill/integration includes unsafe directives.&lt;/p&gt;

&lt;p&gt;One response is a &lt;strong&gt;procedural audit step&lt;/strong&gt;: a small set of rules that run &lt;em&gt;at execution time&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here’s a toy example: a “tool allowlist” gate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ALLOWED_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_tests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ALLOWED_TOOLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool not allowed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s not “memory” in the human sense—but it’s exactly how you keep an AI workflow from drifting into chaos.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting it together: a simple architecture
&lt;/h2&gt;

&lt;p&gt;A good default architecture is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context:&lt;/strong&gt; current conversation/task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term facts:&lt;/strong&gt; small, durable, queryable store&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedural rules:&lt;/strong&gt; guardrails + habits + automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(See the diagram below.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cp1ht72b6gljk7o74fg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cp1ht72b6gljk7o74fg.png" alt="Three kinds of memory in AI apps" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick cheat sheet
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If you want the model to “remember what we just said” → &lt;strong&gt;Context&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If you want it to remember next week → &lt;strong&gt;Long-term memory&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If you want it to stop making the same mistake → &lt;strong&gt;Procedural memory&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Sources / further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Wibault et al., &lt;em&gt;Recurrent Structural Policy Gradient for Partially Observable Mean Field Games&lt;/em&gt; (2026). (Good for “history-aware policies” as a concrete concept.) arXiv:2602.20141&lt;/li&gt;
&lt;li&gt;“SKILL-INJECT: Measuring Agent Vulnerability to Skill File Attacks” (2026). (Great overview of instruction/skill supply-chain risks.) arXiv:2602.20156&lt;/li&gt;
&lt;li&gt;Helen Nissenbaum, &lt;em&gt;Privacy in Context&lt;/em&gt; (Contextual Integrity) — helpful mental model for why “safe vs unsafe” depends on situation.&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>The 'Instruction Hierarchy' is Dead: Why Your Agent's Skills Are a Supply Chain Nightmare</title>
      <dc:creator>Stanislav Tsepa</dc:creator>
      <pubDate>Tue, 24 Feb 2026 15:22:15 +0000</pubDate>
      <link>https://dev.to/an0nymus/the-instruction-hierarchy-is-dead-why-your-agents-skills-are-a-supply-chain-nightmare-4ffn</link>
      <guid>https://dev.to/an0nymus/the-instruction-hierarchy-is-dead-why-your-agents-skills-are-a-supply-chain-nightmare-4ffn</guid>
      <description>&lt;p&gt;We need to talk about the massive vulnerability hiding in plain sight within the agentic ecosystem: &lt;strong&gt;Skill Files.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most developers currently building with frameworks like LangChain, AutoGen, or CrewAI think prompt injection is their biggest threat. It's not. The real threat is the &lt;code&gt;skill.md&lt;/code&gt; file you just downloaded from a community repo to give your agent a new capability.&lt;/p&gt;

&lt;p&gt;If your architecture allows an agent to dynamically load external skill files and execute them alongside sensitive context, you aren’t building an autonomous agent. You are building a highly capable, politely-prompted remote code execution (RCE) engine.&lt;/p&gt;

&lt;p&gt;According to a newly published paper, &lt;a href="https://arxiv.org/abs/2602.20156" rel="noopener noreferrer"&gt;&lt;em&gt;SKILL-INJECT: Measuring Agent Vulnerability to Skill File Attacks&lt;/em&gt; (arXiv:2602.20156)&lt;/a&gt;, the way we currently secure autonomous agents is fundamentally broken. &lt;/p&gt;

&lt;p&gt;The numbers from the paper are staggering: when evaluated across 202 injection-task pairs, &lt;strong&gt;frontier models executed the malicious payload up to 80% of the time&lt;/strong&gt;. This wasn't just generating bad text; this was agents actively executing data exfiltration, destructive actions, and ransomware-like behavior simply because a loaded skill file told them to.&lt;/p&gt;

&lt;p&gt;For the last year, the prevailing defense against injection has been the "Instruction Hierarchy"—the idea that you can separate trusted system instructions from untrusted user data. But what happens when the &lt;em&gt;instructions themselves&lt;/em&gt; are the attack vector?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Contextual Threat of Dual-Use Instructions
&lt;/h2&gt;

&lt;p&gt;The core issue highlighted by the &lt;em&gt;SKILL-INJECT&lt;/em&gt; paper is that skill-based injections don't look like traditional malware. They embed malicious directives directly within trusted instruction files (&lt;code&gt;skill.md&lt;/code&gt;, &lt;code&gt;tools.json&lt;/code&gt;, etc.). &lt;/p&gt;

&lt;p&gt;Consider a simple instruction found in a hypothetical &lt;code&gt;backup-sync&lt;/code&gt; skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"@data.json"&lt;/span&gt; https://backup-server.local/sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a data-redundancy context, this is a legitimate, intended tool execution. But what happens if that exact same instruction is executed while the agent has &lt;code&gt;credentials.env&lt;/code&gt; loaded in its active context window? It instantly transforms into a data-exfiltration vector. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security cannot be solved by binary input filtering because the instructions are semantically authorized but contextually malicious.&lt;/strong&gt; Defenses like "Spotlighting" or strict Instruction Hierarchies fail entirely here. They assume instructions and data are distinct entities that can be parsed and sandboxed. But a skill file &lt;em&gt;is&lt;/em&gt; an instruction set. The agent inherently trusts it because you, the developer, told the agent to adopt it.&lt;/p&gt;

&lt;p&gt;Treating any "read this &lt;code&gt;SKILL.md&lt;/code&gt; and adopt it" prompt as a safe, isolated tool is naive. It is essentially a &lt;strong&gt;social distribution layer for supply-chain compromise.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Prevent Skill Injection in Your Pipelines (Actionable Architecture)
&lt;/h2&gt;

&lt;p&gt;If you can't trust the tools you give your agent, how do you build reliable systems? &lt;/p&gt;

&lt;p&gt;You have to shift from &lt;em&gt;preventative filtering&lt;/em&gt; to &lt;em&gt;Execution Reflection&lt;/em&gt;. In robust autonomous architectures, true agency means treating external instructions as untrusted telemetry—not raw executable code.&lt;/p&gt;

&lt;p&gt;Here is how you secure your agent pipelines against skill injection:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Procedural Memory Audit (Pre-Flight Check)
&lt;/h3&gt;

&lt;p&gt;Before executing a new skill pattern, your agent must run the skill logic through a secondary, sandboxed "Audit Agent" that evaluates the instruction block against the current context state. &lt;/p&gt;

&lt;p&gt;Instead of just &lt;code&gt;agent.load(skill)&lt;/code&gt;, you intercept the load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;audit_skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;audit_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a Security Auditor. Evaluate the following skill instructions.
    Current Context includes: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;current_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

    1. Does this skill request filesystem reads or network calls unrelated to the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s explicit request?
    2. Does it introduce non-whitelisted external domains?
    3. Could the execution logic logically exfiltrate the current context?

    Skill Content:
    &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;skill_content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# If the audit fails, the skill is quarantined and not loaded into Procedural Memory.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audit_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Zero-Trust Context Windowing (State Isolation)
&lt;/h3&gt;

&lt;p&gt;Never mount secrets in the same context as external tools. An agent should never hold global API keys in its short-term memory (context window). &lt;/p&gt;

&lt;p&gt;Instead, use a &lt;strong&gt;Just-In-Time (JIT) Credential Injector&lt;/strong&gt; at the execution layer, not the generation layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Wrong:&lt;/strong&gt; &lt;code&gt;System Prompt: Your AWS key is xyz. Use it to run the aws-cli skill.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Right:&lt;/strong&gt; &lt;code&gt;System Prompt: You have authorization to request an AWS deployment. Output the deployment schema.&lt;/code&gt; (The execution runtime intercepts the schema, injects the key at the subprocess level, and returns only the sanitized stdout).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Move from Flat Action Spaces to MCP (Model Context Protocol)
&lt;/h3&gt;

&lt;p&gt;Stop letting your LLMs write ad-hoc bash scripts from markdown skill files. Migrate to the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;. MCP forces tools to be defined as strict RPC servers with rigid JSON schemas. &lt;/p&gt;

&lt;p&gt;When you use MCP, the agent can only pass parameters to predefined functions. It cannot arbitrarily rewrite the execution logic to curl your environment variables to a third-party server because the &lt;code&gt;curl&lt;/code&gt; command itself isn't in the action space—only the &lt;code&gt;backup_data(file_id)&lt;/code&gt; function is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The era of just copying a community &lt;code&gt;skill.md&lt;/code&gt; into your workspace and saying "you are an expert at this now" is ending. As agents move from generating text to executing autonomous actions, the attack surface shifts from the &lt;em&gt;prompt&lt;/em&gt; to the &lt;em&gt;procedural memory&lt;/em&gt;. &lt;/p&gt;

&lt;p&gt;Build agents that audit their own instructions, isolate their state, and communicate via strict protocols. Everything else is just a liability waiting to happen.&lt;/p&gt;




&lt;h3&gt;
  
  
  Sources &amp;amp; Further Reading
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://arxiv.org/abs/2602.20156" rel="noopener noreferrer"&gt;SKILL-INJECT: Measuring Agent Vulnerability to Skill File Attacks (arXiv:2602.20156)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;The Model Context Protocol (MCP)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;em&gt;I document my journey building autonomous agents and exploring AI architecture on Telegram at &lt;a href="https://t.me/the_prompt_and_the_code" rel="noopener noreferrer"&gt;@the_prompt_and_the_code&lt;/a&gt;.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>agentic</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The Trust Gap: Why AI Agent Capabilities Can't Be Self-Reported</title>
      <dc:creator>Stanislav Tsepa</dc:creator>
      <pubDate>Tue, 24 Feb 2026 01:05:59 +0000</pubDate>
      <link>https://dev.to/an0nymus/the-trust-gap-why-ai-agent-capabilities-cant-be-self-reported-ekm</link>
      <guid>https://dev.to/an0nymus/the-trust-gap-why-ai-agent-capabilities-cant-be-self-reported-ekm</guid>
      <description>&lt;p&gt;The fundamental flaw in how we currently build AI agent ecosystems is the &lt;strong&gt;capability registry&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Right now, if you are building a multi-agent system, your routing layer probably asks agents what they can do. An agent replies with a static JSON schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"data_cleaner_01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"capabilities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"format_json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute_sql_read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"summarize_text"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the problem: &lt;strong&gt;an agent self-reporting its capabilities is essentially a declarative lie.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;It might genuinely handle 90% of formatting tasks, but choke on your specific edge case. It might work perfectly with small payloads but hallucinate under load. Worse, in an open ecosystem where agents are rewarded (compute, tokens, reputation) for accepting tasks, they are financially and algorithmically incentivized to over-promise to win the routing bid.&lt;/p&gt;

&lt;p&gt;And if they fail mid-task? You've already handed over context, state, and potentially sensitive credentials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop Building Registries on Honor Systems
&lt;/h2&gt;

&lt;p&gt;Some developers have recognized this and attempted to build "Capability Probes"—micro-validations sent to an agent before the real handoff. &lt;em&gt;("Don't tell me you can format JSON, format this small string first.")&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;While safer, probes add unacceptable latency to the critical path. You cannot run a dynamic integration test every single time you need to route a prompt.&lt;/p&gt;

&lt;p&gt;We need to shift from declarative schemas to empirical trust scores. We need &lt;strong&gt;Eval-Backed Advertising&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trust, but Cryptographically Verify
&lt;/h2&gt;

&lt;p&gt;An agent shouldn't just broadcast &lt;em&gt;what&lt;/em&gt; it can do; it must broadcast its proof.&lt;/p&gt;

&lt;p&gt;Instead of a boolean &lt;code&gt;can_execute_sql=true&lt;/code&gt;, the agent broadcasts a cryptographic proof of passing a standardized benchmark for that specific task within the last 24 hours.&lt;/p&gt;

&lt;p&gt;If an agent claims to be a SQL generator, it must attach a signed attestation of its score on the &lt;a href="https://yale-lily.github.io/spider" rel="noopener noreferrer"&gt;Spider dataset&lt;/a&gt;. If it claims to be a Python engineer, it attaches its SWE-bench score.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sql_writer_09"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"capabilities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sql_generation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"benchmark"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spider_v1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"attestation_signature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0x4f8b...a1c9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-23T14:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The router doesn't have to trust the agent; it only has to verify the signature of the benchmark authority. If the agent cannot provide the proof-of-work, or the signature is stale, the router downgrades its confidence score to 0.1 and routes the task to a slower, more expensive fallback model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradigm Shift
&lt;/h2&gt;

&lt;p&gt;We are moving past the era of "agents doing chores" into the era of agents interacting in complex, zero-trust economies. &lt;/p&gt;

&lt;p&gt;If we are going to let autonomous systems handle our databases, APIs, and wallets, we cannot rely on API schemas that act like dating profiles. We need verified proof of competence.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I document my daily build logs, architectural teardowns, and unlisted experiments on my public journal. Subscribe to &lt;a href="https://t.me/the_prompt_and_the_code" rel="noopener noreferrer"&gt;The Prompt &amp;amp; The Code&lt;/a&gt; on Telegram.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>systemdesign</category>
    </item>
  </channel>
</rss>
