<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nick</title>
    <description>The latest articles on DEV Community by Nick (@nikolasi).</description>
    <link>https://dev.to/nikolasi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3793770%2F973ad0d9-6c03-4b95-b244-8b5151e813fa.png</url>
      <title>DEV Community: Nick</title>
      <link>https://dev.to/nikolasi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nikolasi"/>
    <language>en</language>
    <item>
      <title>Solving agent system prompt drift in long sessions — a 300-token fix</title>
      <dc:creator>Nick</dc:creator>
      <pubDate>Thu, 26 Feb 2026 06:49:34 +0000</pubDate>
      <link>https://dev.to/nikolasi/solving-agent-system-prompt-drift-in-long-sessions-a-300-token-fix-1akh</link>
      <guid>https://dev.to/nikolasi/solving-agent-system-prompt-drift-in-long-sessions-a-300-token-fix-1akh</guid>
      <description>&lt;p&gt;The problem&lt;/p&gt;

&lt;p&gt;If you've run any LLM agent for 30+ minutes, you've seen this: the agent follows its system prompt perfectly at the start, then gradually drifts. An hour in — it acts like the prompt never existed.&lt;/p&gt;

&lt;p&gt;This happens with every model, every framework, every agent. It's not a bug — it's how attention works in transformers. The system prompt is tokens at the beginning of context. As context grows, those tokens&lt;br&gt;
   lose weight. 1,000 prompt tokens out of 2,000 total = 50% attention. 1,000 out of 80,000 = ~1%.&lt;/p&gt;

&lt;p&gt;What doesn't work well&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeating the prompt every N messages — eats context window (2,000+ tokens each time), and passive re-reading is weaker than active generation&lt;/li&gt;
&lt;li&gt;Restarting the session — kills accumulated context, unacceptable for agents mid-task&lt;/li&gt;
&lt;li&gt;Summarization / memory layers — help with information recall, but don't restore attention to instructions and rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What works: SCAN&lt;/p&gt;

&lt;p&gt;Make the model generate tokens semantically linked to its instructions. Not re-read them — generate new ones by answering questions about them. Generation creates ~20 tokens that actively link instructions&lt;br&gt;
  to the current task. Prompt repetition inserts 2,000+ tokens the model passively skims.&lt;/p&gt;

&lt;p&gt;How it works&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Markers — questions at the end of each section in the system prompt:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;[Section: data handling rules]&lt;br&gt;
  ...your rules here...&lt;br&gt;
  @@SCAN_1: What data will this task affect? What if state is stale?&lt;/p&gt;

&lt;p&gt;[Section: error handling]&lt;br&gt;
  ...your rules here...&lt;br&gt;
  @@SCAN_2: What's the most likely failure mode for this task?&lt;/p&gt;

&lt;p&gt;Markers at the end — to answer the question, the model must read the section first.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger — before a task, the agent answers its markers:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;SCAN_1: Task affects session state. If stale — double charge.&lt;br&gt;
  SCAN_2: Timeout on external API without retry logic.&lt;/p&gt;

&lt;p&gt;1-2 sentences per marker. ~300 tokens total vs 2,000+ for prompt repetition.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Post-task check:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CHECK: session reset ✓, error codes ✓&lt;br&gt;
  MISSED: didn't verify concurrent requests — acceptable, single-threaded task&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Levels — FULL (~300 tokens, all markers) for critical tasks. MINI (~120 tokens, key markers) for medium. ANCHOR (~20 tokens, one line) between subtasks. SKIP for trivial ops.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Key constraint: SCAN answers must be in the model's output, not in internal thinking/reasoning. Token generation in output is what restores attention.&lt;/p&gt;

&lt;p&gt;Multi-agent systems&lt;/p&gt;

&lt;p&gt;Each agent in a pipeline runs SCAN independently and returns CHECK/MISSED to the orchestrator. Without this, a sub-agent loses all instruction context by the time it finishes. The orchestrator sees what was&lt;br&gt;
  verified across the entire chain.&lt;/p&gt;

&lt;p&gt;What this addresses beyond drift&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt injection defense — safety instructions with maintained attention weight can't be outweighed by attacker tokens&lt;/li&gt;
&lt;li&gt;Tool calling accuracy — API schemas decay like everything else, a marker keeps them alive&lt;/li&gt;
&lt;li&gt;Multi-agent coordination — CHECK/MISSED creates visibility into what each agent actually verified&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My experience&lt;/p&gt;

&lt;p&gt;I use this daily with 11 agents, 100K+ context, 7 markers. Cost is under 0.5% of total tokens. Without SCAN — agents reliably lose critical rules by mid-session. With SCAN — stable across entire session&lt;br&gt;
  length.&lt;/p&gt;

&lt;p&gt;I'm not selling anything — the method is open, adapt it however you want. If you try it, I'd love to hear what works and what doesn't.&lt;/p&gt;

&lt;p&gt;Full writeup with detailed technical explanation, multi-agent propagation protocol, and complete prompt templates: [&lt;a href="https://gist.github.com/sigalovskinick/c6c88f235dc85be9ae40c4737538e8c6" rel="noopener noreferrer"&gt;https://gist.github.com/sigalovskinick/c6c88f235dc85be9ae40c4737538e8c6&lt;/a&gt;]&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
