<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: LienJack</title>
    <description>The latest articles on DEV Community by LienJack (@lien_jp_db54b8b7fd9fa0118).</description>
    <link>https://dev.to/lien_jp_db54b8b7fd9fa0118</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3921832%2Ff68a8c58-a56d-42d6-b67e-f5e7ea278322.png</url>
      <title>DEV Community: LienJack</title>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lien_jp_db54b8b7fd9fa0118"/>
    <language>en</language>
    <item>
      <title>Claude Code Source Analysis Series, Chapter 6: Tools Overview</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 08:11:19 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-6-tools-overview-4l7c</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-6-tools-overview-4l7c</guid>
      <description>&lt;h1&gt;
  
  
  Chapter 6 of the &lt;em&gt;Claude Code Source Analysis Series&lt;/em&gt; | Tools Overview
&lt;/h1&gt;

&lt;p&gt;This article focuses on the layer that turns model intent into real engineering action.&lt;/p&gt;

&lt;p&gt;Inside Claude Code, &lt;code&gt;QueryEngine&lt;/code&gt; runs the multi-turn agent loop, the prompt runtime assembles what the model sees on each turn, and context management keeps long-running work from collapsing under its own history. The tool system is the next critical layer: once the model decides what it wants to do, how does Claude Code turn that intent into an action that is executable, constrained, recoverable, and auditable?&lt;/p&gt;

&lt;p&gt;The model itself does not execute commands or modify files directly. It emits structured action intent, and the runtime tool system decides how that intent is interpreted, gated, executed, and written back into the session.&lt;/p&gt;

&lt;p&gt;To keep the discussion concrete, we will use one running example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Help me fix the failing tests in this project.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we've already discussed, Claude Code can't stop at "guessing." For this task, it actually needs to do the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search the project structure&lt;/li&gt;
&lt;li&gt;Read relevant files&lt;/li&gt;
&lt;li&gt;Edit code&lt;/li&gt;
&lt;li&gt;Run the tests&lt;/li&gt;
&lt;li&gt;Adjust based on errors&lt;/li&gt;
&lt;li&gt;Ask for confirmation before high-risk operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Behind these actions is the Tools system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuzn81ecujukn1bqjbbz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmuzn81ecujukn1bqjbbz.png" alt="05.0 Core Mechanism - Tools Figure 1" width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the core question this chapter answers is not "what tools does Claude Code have," but rather:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How does Claude Code turn the model's intent to act into engineering actions that are executable, constrainable, recoverable, and auditable?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  1. &lt;code&gt;Tool.ts&lt;/code&gt; Solves the Problem of "Actions Must Become Protocols First"
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Tool.ts&lt;/code&gt; is not a specific tool, but the contract that all tools must honor.&lt;/p&gt;

&lt;p&gt;You can think of it as a "tool ID card." Every tool must declare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its name&lt;/li&gt;
&lt;li&gt;What parameters it accepts&lt;/li&gt;
&lt;li&gt;Whether it is read-only&lt;/li&gt;
&lt;li&gt;Whether it can run concurrently&lt;/li&gt;
&lt;li&gt;What permissions it needs&lt;/li&gt;
&lt;li&gt;What context it receives during execution&lt;/li&gt;
&lt;li&gt;How it hands results back to the system after execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step is critical. Only once an action has been protocolized can the system govern it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qlk4mit1ifd87p8m2dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qlk4mit1ifd87p8m2dj.png" alt="05.0 Core Mechanism - Tools Figure 2" width="800" height="121"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A mature tool cannot be just a function. It must simultaneously answer "how to invoke it," "whether it can be invoked," "where it can be invoked," and "what happens after invocation."&lt;/p&gt;

&lt;p&gt;This is the first core layer of the Claude Code tool system:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A Tool is not a feature button — it is the runtime contract that a model action must sign before entering the real world.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxn4f3gte2u4gs8szlhv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxn4f3gte2u4gs8szlhv.png" alt="05.0 Core Mechanism - Tools Hand-drawn Figure 1: Tool.ts is the runtime contract" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. &lt;code&gt;inputSchema&lt;/code&gt; Turns Model Output from "Natural Language" into "Structured Intent"
&lt;/h2&gt;

&lt;p&gt;The most easily underestimated part of the tool protocol is &lt;code&gt;inputSchema&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Its purpose isn't to make TypeScript look pretty. It's to constrain model output into parseable data.&lt;/p&gt;

&lt;p&gt;Take file reading, for example. If the model just says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I want to look at src/foo.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The host program still has to guess at its intent. But if the model emits a tool call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"file_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/foo.ts"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system knows unambiguously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tool to invoke&lt;/li&gt;
&lt;li&gt;What the parameters are&lt;/li&gt;
&lt;li&gt;Whether the parameters are valid&lt;/li&gt;
&lt;li&gt;Whether this action is a read, write, search, or execution&lt;/li&gt;
&lt;li&gt;Which permission and execution path to follow next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also the key difference between function calling, tool use, and a plain prompt: the model doesn't just "say what it wants to do" — it submits an executable request against the protocol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jeoorrcjof310ejq3zr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6jeoorrcjof310ejq3zr.png" alt="05.0 Core Mechanism - Tools Figure 3" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the value of &lt;code&gt;inputSchema&lt;/code&gt; goes beyond just "defining parameters."&lt;/p&gt;

&lt;p&gt;It turns a model's vague intent into an engineering object the system can act on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb72e042h0et8xy809xbv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb72e042h0et8xy809xbv.png" alt="05.0 Core Mechanism - Tools Sketch 2: natural-language intent becomes a structured tool call" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. &lt;code&gt;ToolUseContext&lt;/code&gt; — Tools Are Not Isolated Functions
&lt;/h2&gt;

&lt;p&gt;If you look at a single tool in isolation, it's easy to imagine it running like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input parameters -&amp;gt; execute function -&amp;gt; return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(Many demo-level Agent frameworks are designed exactly this way. The cracks don't show until you hit production.)&lt;/p&gt;

&lt;p&gt;But Claude Code's tools don't operate that way.&lt;/p&gt;

&lt;p&gt;When a tool executes, it receives the full &lt;code&gt;ToolUseContext&lt;/code&gt;, the runtime context object passed into tool execution. This context carries a wealth of information the current session needs to function, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The currently active tool set&lt;/li&gt;
&lt;li&gt;The MCP client and MCP resource&lt;/li&gt;
&lt;li&gt;The current AppState&lt;/li&gt;
&lt;li&gt;The message history&lt;/li&gt;
&lt;li&gt;The file read cache&lt;/li&gt;
&lt;li&gt;The abort controller&lt;/li&gt;
&lt;li&gt;Notification capabilities&lt;/li&gt;
&lt;li&gt;Task and file history updaters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What this means is that a tool is never an "island." Every action it performs can ripple through the entire session.&lt;/p&gt;

&lt;p&gt;Returning to the "fix a failing test" example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Grep&lt;/code&gt; finds files related to the failing test — that affects the model context for the next turn.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Read&lt;/code&gt; reads a file — the system tracks the read state.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Edit&lt;/code&gt; modifies a file — the UI needs to render the diff.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Bash&lt;/code&gt; runs a test and it fails — the error log flows back into the message stream.&lt;/li&gt;
&lt;li&gt;If the user interrupts — any long-running command must be cancellable or wind itself down cleanly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hq0s1hw9h9d2vd3ps3w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hq0s1hw9h9d2vd3ps3w.png" alt="05.0 Core Mechanism - Tools Figure 4" width="800" height="637"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So the tool system is not a simple function-call layer.&lt;/p&gt;

&lt;p&gt;It is part of the Claude Code runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. &lt;code&gt;tools.ts&lt;/code&gt; Is a Tool Registry, Not the Final Menu
&lt;/h2&gt;

&lt;p&gt;Once you understand what a single Tool looks like, the next step is &lt;code&gt;tools.ts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It's responsible for registering Claude Code's foundational capabilities into a tool pool. You'll see many categories of tools here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File tools: Read, Edit, Write, Notebook&lt;/li&gt;
&lt;li&gt;Search tools: Glob, Grep&lt;/li&gt;
&lt;li&gt;Terminal tools: Bash, PowerShell&lt;/li&gt;
&lt;li&gt;Web tools: WebFetch, WebSearch, WebBrowser&lt;/li&gt;
&lt;li&gt;Collaboration tools: Agent, SendMessage, AskUserQuestion&lt;/li&gt;
&lt;li&gt;Workflow tools: Todo, Task, Plan, Worktree&lt;/li&gt;
&lt;li&gt;Extension tools: MCP, LSP, ToolSearch, Skill&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there's one point where people commonly trip up:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;getAllBaseTools()&lt;/code&gt; produces a candidate pool, not the final tool menu that the model sees.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Many readers make a wrong assumption at this stage, thinking that however many tools are registered is how many the model can use directly. That's not how it works.&lt;/p&gt;

&lt;p&gt;Claude Code first assembles a large candidate pool, then filters it down layer by layer based on environment, mode, rules, and runtime state. Only then does it produce the tools visible for the current turn.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fix7834cjz60gb18pnsn3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fix7834cjz60gb18pnsn3.png" alt="05.0 Core Mechanism - Tools Figure 5" width="800" height="50"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This pipeline illustrates a foundational principle for mature agent systems:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More capabilities is not better. Capabilities must be dynamically trimmed by context, permissions, and cost.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  5. Why Tools Are Filtered Before They Reach the Model
&lt;/h2&gt;

&lt;p&gt;Here we have a critical security design decision.&lt;/p&gt;

&lt;p&gt;Claude Code does not wait until the model invokes a tool to decide whether it can execute. It performs "tool visibility filtering" first.&lt;/p&gt;

&lt;p&gt;If a tool is entirely blocked by a deny rule, the model simply never sees it in this round.&lt;/p&gt;

&lt;p&gt;Think of it as two gates:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6avb7xi5j5pwm7iac7hu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6avb7xi5j5pwm7iac7hu.png" alt="05.0 Core Mechanism - Tools Figure 6" width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To put it bluntly: if the model can't see a tool, it won't plan tasks around it. This is far safer than "let it see, then reject."&lt;/p&gt;

&lt;p&gt;The first gate answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is the model even allowed to see this tool in this round?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second gate answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can this specific invocation actually execute?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These two concerns must not be conflated.&lt;/p&gt;

&lt;p&gt;That is the point of "pushing security upstream."&lt;/p&gt;

&lt;p&gt;(In permission design, we often face this temptation: "let the model see everything, then block at execution time." Claude Code makes the opposite choice — what shouldn't be seen simply isn't shown. The cost is that the tool list changes frequently, but security improves by an order of magnitude.)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faynrbz5gz730dwvecjfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faynrbz5gz730dwvecjfb.png" alt="05.0 Core Mechanism - Hand-drawn Figure 3: Two gates for tool visibility and execution permissions" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. &lt;code&gt;ToolPermissionContext&lt;/code&gt; — The Permission Backpack
&lt;/h2&gt;

&lt;p&gt;Both tool filtering and tool execution depend on &lt;code&gt;ToolPermissionContext&lt;/code&gt;, the runtime bundle of rules and permission state used to decide visibility and execution behavior.&lt;/p&gt;

&lt;p&gt;It's not a simple &lt;code&gt;true&lt;/code&gt; / &lt;code&gt;false&lt;/code&gt; toggle. It's an entire bundle of permission context, typically containing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the current permission mode&lt;/li&gt;
&lt;li&gt;user-level rules&lt;/li&gt;
&lt;li&gt;project-level rules&lt;/li&gt;
&lt;li&gt;local rules&lt;/li&gt;
&lt;li&gt;policy rules&lt;/li&gt;
&lt;li&gt;command-line rules&lt;/li&gt;
&lt;li&gt;session-level rules&lt;/li&gt;
&lt;li&gt;three behavior categories: allow / deny / ask&lt;/li&gt;
&lt;li&gt;whether bypass is permitted&lt;/li&gt;
&lt;li&gt;whether dialogs should be suppressed&lt;/li&gt;
&lt;li&gt;additional working-directory boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This explains why Claude Code's permission system feels "heavyweight."&lt;/p&gt;

&lt;p&gt;Because it's not just answering "can this tool be used?" — it's answering something far more nuanced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the current project,
under the current permission mode,
accounting for user settings, project settings, policy settings, CLI arguments, and session-level ad-hoc rules —
should this tool even be visible to the model?
And if the model does invoke it, should that invocation be allowed, asked about, or denied?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeoffne5kukajzvcuiit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmeoffne5kukajzvcuiit.png" alt="05.0 Core Mechanism - Tools Figure 7" width="800" height="1668"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most critical rule of all:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Deny beats allow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even if a tool is permitted somewhere, it must be rejected as soon as a more specific rule explicitly denies it. A security system can't rely on "default trust"; explicit denials must carry higher priority.&lt;/p&gt;

&lt;p&gt;(This mirrors firewall rule-matching logic: more specific rules take precedence, and once a deny rule is hit, the chain stops — no further rules are evaluated.)&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Tool Execution Is Not Just "Calling a Function" — It's a Lifecycle
&lt;/h2&gt;

&lt;p&gt;When the model actually issues a &lt;code&gt;tool_use&lt;/code&gt; block, the structured tool-call record returned by the model, Claude Code still has to run it through an execution pipeline.&lt;/p&gt;

&lt;p&gt;A typical tool lifecycle looks roughly like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4xje4nsz3svc5g8yru9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4xje4nsz3svc5g8yru9.png" alt="05.0 Core Mechanism - Tools Figure 8" width="800" height="2234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;None of the steps in this pipeline are decorative.&lt;/p&gt;

&lt;p&gt;Parameter validation prevents the model from sending malformed structures.&lt;/p&gt;

&lt;p&gt;Permission checks block dangerous operations.&lt;/p&gt;

&lt;p&gt;Scheduling determines which tools can run in parallel and which must be serialized.&lt;/p&gt;

&lt;p&gt;Result serialization ensures the model can understand what just happened in the next round.&lt;/p&gt;

&lt;p&gt;Message write-back guarantees the entire session isn't a one-shot action — it's a cycle that can keep advancing.&lt;/p&gt;

&lt;p&gt;Strip all of this away, and Claude Code degrades to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model says something -&amp;gt; program takes a gamble -&amp;gt; command runs unchecked -&amp;gt; result gets stuffed back in
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That clearly cannot support real-world engineering projects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq8nyuieuctt9ovdre64.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feq8nyuieuctt9ovdre64.png" alt="05.0 Core Mechanism - Tools Sketch 4: tool execution lifecycle" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Why Tools Are Categorized as Read-Only, Destructive, and Concurrency-Safe
&lt;/h2&gt;

&lt;p&gt;In a simple demo, the only question about a tool tends to be "can I call it or not?"&lt;/p&gt;

&lt;p&gt;But in a real development environment like Claude Code, a tool needs to answer at least three questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, is it read-only?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Grep&lt;/code&gt;, and &lt;code&gt;Glob&lt;/code&gt; are generally low-risk tools because they observe the project without directly modifying it. &lt;code&gt;Edit&lt;/code&gt;, &lt;code&gt;Write&lt;/code&gt;, and &lt;code&gt;Bash&lt;/code&gt;, on the other hand, can change files or the environment and carry higher risk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, is it a destructive operation?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even within &lt;code&gt;Bash&lt;/code&gt;, &lt;code&gt;npm test&lt;/code&gt; and &lt;code&gt;rm -rf&lt;/code&gt; are not remotely in the same league. The tool system must support finer-grained risk assessment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, can it run concurrently?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Two read tools running in parallel is usually fine. But two write tools modifying the same area at the same time, or one &lt;code&gt;Bash&lt;/code&gt; command that depends on the output of another, cannot be parallelized casually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc5mizenw6ke1w2frt8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwc5mizenw6ke1w2frt8z.png" alt="05.0 Core Mechanism - Tools Figure 9" width="800" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is why the Tool protocol includes so much metadata that seems "extra" at first glance.&lt;/p&gt;

&lt;p&gt;It's not about making the interface complex. It's about letting the system know: how should this action be treated?&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Built-in Tools Fall into Five Categories — Not a Pile of Names
&lt;/h2&gt;

&lt;p&gt;If you just list 40+ tools, readers get lost quickly.&lt;/p&gt;

&lt;p&gt;A better way to understand them is by grouping them around &lt;em&gt;what problem they solve&lt;/em&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Representative Tools&lt;/th&gt;
&lt;th&gt;Problem Solved&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Files &amp;amp; Search&lt;/td&gt;
&lt;td&gt;Read, Edit, Write, Glob, Grep&lt;/td&gt;
&lt;td&gt;Let the agent understand and modify the project&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shell Execution&lt;/td&gt;
&lt;td&gt;Bash, PowerShell&lt;/td&gt;
&lt;td&gt;Let the agent verify, build, and test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session Control&lt;/td&gt;
&lt;td&gt;AskUserQuestion, Todo, Plan&lt;/td&gt;
&lt;td&gt;Let the agent plan, clarify, and maintain task state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Collaboration Tasks&lt;/td&gt;
&lt;td&gt;Agent, Task, SendMessage&lt;/td&gt;
&lt;td&gt;Let complex work be split, tracked, and results collected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External Extension&lt;/td&gt;
&lt;td&gt;MCP, LSP, WebFetch, WebSearch, Skill&lt;/td&gt;
&lt;td&gt;Extend capability boundaries to external services and reusable workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These categories capture a single insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Claude Code isn't just "able to operate on files" — it's decomposing the real software development process into a set of governable action interfaces.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When fixing tests, the agent might walk a path like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7qtuyqij8250x7892rb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7qtuyqij8250x7892rb.png" alt="05.0 Core Mechanism - Tools Figure 10" width="800" height="85"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't "one tool call" — it's a closed loop where a chain of tool invocations and model reasoning push each other forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Why MCP, LSP, and Skill Can All Plug Into the Same System
&lt;/h2&gt;

&lt;p&gt;A unified Tool protocol has another major benefit: new capabilities can be plugged in without upending the entire architecture.&lt;/p&gt;

&lt;p&gt;Whether it's an MCP tool, an LSP tool, or a Skill tool, they all ultimately need to be translated into a tool view that Claude Code understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A name&lt;/li&gt;
&lt;li&gt;An input schema&lt;/li&gt;
&lt;li&gt;A description&lt;/li&gt;
&lt;li&gt;Enabling conditions&lt;/li&gt;
&lt;li&gt;Permission semantics&lt;/li&gt;
&lt;li&gt;An execution result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvtnw6chlnzuam7ttsli.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvtnw6chlnzuam7ttsli.png" alt="05.0 Core Mechanism - Tools Figure 11" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's the technical debt the unified protocol eliminates.&lt;/p&gt;

&lt;p&gt;Without it, every external system you connect would require inventing a new set of rules. The more you hook up, the messier the system becomes.&lt;/p&gt;

&lt;p&gt;With a unified protocol, adding a new capability boils down to answering five questions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How do you describe yourself?
How do you receive input?
How do you execute?
How do you declare risk?
How do you hand results back to the main loop?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  11. The Tool System Is Where Claude Code's Engineering Philosophy Really Shows
&lt;/h2&gt;

&lt;p&gt;After reading through the Tools system, the most important takeaway isn't memorizing any particular tool name — it's understanding Claude Code's engineering orientation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model is not the executor. The runtime is the executor.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model decides whether the next step requires action and what the intent behind that action is. The tool system inside the host process is what actually carries out the action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools aren't plugins — they're runtime protocols.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every tool must pass through the full pipeline: schema registration, context, permissions, dispatch, result backfill, and UI presentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security isn't a final pop-up. It's two-phase governance: tool exposure and tool execution.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What the model can see is itself part of the security boundary. What the model actually gets to invoke forms the second boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extensibility isn't about maximum surface area — it's about being trimmable, filterable, and auditable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code supports MCP, LSP, Skills, and multi-agent workflows not by dumping every capability onto the model, but because every one of those capabilities has to pass through the same tool pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  12. The Whole Chapter in One Diagram
&lt;/h2&gt;

&lt;p&gt;Finally, here is Claude Code's tool system compressed into a single complete diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5xqe7vcpyy5llbp6bpy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb5xqe7vcpyy5llbp6bpy.png" alt="05.0 Core Mechanism - Tools Figure 12" width="800" height="2216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use this diagram as a map when reading &lt;code&gt;Tool.ts&lt;/code&gt;, &lt;code&gt;tools.ts&lt;/code&gt;, &lt;code&gt;toolExecution.ts&lt;/code&gt;, and the permission-related code.&lt;/p&gt;

&lt;h2&gt;
  
  
  13. Which Tool Chain to Follow When Reading the Source
&lt;/h2&gt;

&lt;p&gt;If you really want to understand the tool system by reading the source, don't start from a specific tool. Instead, trace a complete call chain first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool.ts
-&amp;gt; tools.ts
-&amp;gt; query.ts
-&amp;gt; toolExecution.ts
-&amp;gt; permissions.ts
-&amp;gt; tool_result → backfilled into messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step one&lt;/strong&gt;: read &lt;code&gt;Tool.ts&lt;/code&gt;. The focus isn't the tool names, but the &lt;code&gt;Tool&lt;/code&gt; protocol itself: &lt;code&gt;inputSchema&lt;/code&gt;, &lt;code&gt;call&lt;/code&gt;, &lt;code&gt;validateInput&lt;/code&gt;, &lt;code&gt;checkPermissions&lt;/code&gt;, &lt;code&gt;isReadOnly&lt;/code&gt;, &lt;code&gt;isConcurrencySafe&lt;/code&gt;, &lt;code&gt;isDestructive&lt;/code&gt;, &lt;code&gt;interruptBehavior&lt;/code&gt;, &lt;code&gt;maxResultSizeChars&lt;/code&gt;. Together, these fields answer one question: what governance information does the system need to know about a model-initiated action before it enters a real engineering environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step two&lt;/strong&gt;: read &lt;code&gt;tools.ts&lt;/code&gt;. &lt;code&gt;getAllBaseTools()&lt;/code&gt; is only a candidate pool, not the model's final menu. Before being exposed to the model, tools pass through mode filtering, permission deny-rule filtering, MCP tool merging, sorting, deduplication, and cache stability handling. A key point here: tool visibility is itself part of permissions. A blanket-denied tool should ideally disappear before the model ever sees it—not after the model calls it and gets rejected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step three&lt;/strong&gt;: go back to &lt;code&gt;query.ts&lt;/code&gt;. The &lt;code&gt;tool_use&lt;/code&gt; blocks returned by the model are collected and handed to &lt;code&gt;runTools()&lt;/code&gt; or &lt;code&gt;StreamingToolExecutor&lt;/code&gt;. This is where you see the interface between the tool system and the ReAct main loop: a tool is not a UI button, but a fork point in the next round of the state machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step four&lt;/strong&gt;: read the single-invocation lifecycle in &lt;code&gt;toolExecution.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Find the tool definition
-&amp;gt; inputSchema validation
-&amp;gt; tool-level validateInput
-&amp;gt; PreToolUse hooks
-&amp;gt; permission check
-&amp;gt; tool.call()
-&amp;gt; result serialization
-&amp;gt; PostToolUse hooks
-&amp;gt; produce tool_result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lifecycle is what separates a production-grade agent from a simple function map. Errors don't blow up the main loop; instead, they are converted into tool results the model can understand in the next round whenever possible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step five&lt;/strong&gt;: pick one concrete tool to read, such as &lt;code&gt;FileReadTool&lt;/code&gt;. It's not just &lt;code&gt;fs.readFile()&lt;/code&gt;—it also handles path validation, large-file budgeting, offset/limit, PDF/image processing, duplicate-read deduplication, permission checks, Skill triggering, and UI display. After reading it, you'll better understand why Claude Code builds tools as a "semantic protocol" rather than stuffing every action into Bash.&lt;/p&gt;

&lt;p&gt;Once you've traced this chain, the essence of Tools becomes clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model only proposes structured intent.
The tool protocol describes the boundaries of an action.
The executor governs the lifecycle.
The permission system decides whether the action lands.
tool_result brings the real world back to the model.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  14. Summary
&lt;/h2&gt;

&lt;p&gt;Claude Code's tool system can be summed up in a single sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tools are the runtime protocol layer in Claude Code that turns model intent into real engineering actions; they give the model hands and feet, while also fitting those hands and feet with boundaries, permissions, and feedback loops.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you grasp Tools, you stop seeing Claude Code as "a chat model plus a few plugins." It's better understood as an Agent Harness: the model handles thinking, tools handle acting, permissions handle boundaries, and state ties each action into a sustainable, forward-moving engineering loop.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>claude</category>
      <category>llm</category>
    </item>
    <item>
      <title>Context Governance for Coding Agents</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 08:05:42 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/context-governance-for-coding-agents-bgl</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/context-governance-for-coding-agents-bgl</guid>
      <description>&lt;h1&gt;
  
  
  Context Governance for Coding Agents
&lt;/h1&gt;

&lt;p&gt;When people first hear the phrase "context management," they often reduce it to two ideas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use a larger context window.
Compress history when the window is about to overflow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not wrong, but it is far too narrow.&lt;/p&gt;

&lt;p&gt;In ordinary chat systems, context management really is mostly about conversation history. But once a system becomes a coding agent, especially one that reads files, calls tools, runs commands, writes code, and interacts with external systems, context is no longer just a transcript. It becomes the whole working scene the model can see on every turn.&lt;/p&gt;

&lt;p&gt;So the real question is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;During real engineering work, an agent keeps producing new information. How does the system decide what should enter the model, what should stay outside it, what should be compressed, and what must survive over time?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article uses Claude Code as one concrete case study, but it is not only about Claude Code.&lt;/p&gt;

&lt;p&gt;Claude Code is a strong case study because it exposes the context problem in a very direct way: source files are long, tool outputs are long, test logs are long, and tasks regularly stretch across dozens of turns. But the same class of problem appears in many other agent systems too, including LangGraph, the OpenAI Agents SDK, AutoGen, Cursor, Devin, OpenClaw, and Hermes.&lt;/p&gt;

&lt;p&gt;The difference is where each project places the weight:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code is closer to a long-running CLI agent. Its pressure comes from tool output, project rules, compression, and recovery.&lt;/li&gt;
&lt;li&gt;LangGraph is closer to a workflow state machine. Its pressure comes from structured state, checkpoints, and resumable execution.&lt;/li&gt;
&lt;li&gt;OpenAI Agents SDK is closer to an application SDK. Its pressure comes from separating local runtime context from model-visible context.&lt;/li&gt;
&lt;li&gt;AutoGen is closer to a multi-agent conversation framework. Its pressure comes from role separation, memory injection, and collaborative context flow.&lt;/li&gt;
&lt;li&gt;Cursor and Copilot are closer to in-IDE real-time assistants. Their pressure comes from low latency, local code snippets, and retrieval precision.&lt;/li&gt;
&lt;li&gt;Hermes, OpenClaw, and enterprise harnesses lean more toward long-running runtime governance, entry-point control, and policy enforcement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So context management is not one feature inside one product. It is a foundational engineering problem that almost every serious agent system eventually hits.&lt;/p&gt;

&lt;p&gt;In this article, I use &lt;code&gt;context management&lt;/code&gt; for the operational mechanics and &lt;code&gt;context governance&lt;/code&gt; for the broader design problem around visibility, authority, recall, compression, and isolation.&lt;/p&gt;

&lt;p&gt;The point here is to widen the lens and look at the broader governance model underneath the implementation details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model is stateless.
The task is continuous.
Information explodes.
The context window is finite.
The outer system has to rebuild the working scene every turn.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To keep the discussion concrete, we will use one running example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The user says: post-login redirect is broken in this project. Find the cause and fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A real agent would not stop at "maybe check the route guard." It would do something more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inspect the project structure
-&amp;gt; Search for login-related code
-&amp;gt; Read the route guard
-&amp;gt; Read auth state management
-&amp;gt; Run tests
-&amp;gt; Analyze error logs
-&amp;gt; Modify code
-&amp;gt; Run tests again
-&amp;gt; Summarize the change and the remaining risks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step produces more information. Context governance exists to keep that information alive across a long task without drowning the model in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why Context Management Becomes an Engineering Problem
&lt;/h2&gt;

&lt;p&gt;Start with the most basic fact:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every model call is stateless by default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model does not naturally remember which file it read on the previous turn, nor does it automatically know where the last test failed. An agent only appears continuous because the runtime outside the model reconstructs the current working scene on each round and sends it back in.&lt;/p&gt;

&lt;p&gt;A simple chat turn looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user question
-&amp;gt; model answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An agent turn looks much more like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;system rules
+ project rules
+ current user goal
+ message history
+ tool descriptions
+ recent tool results
+ current task state
+ compressed summaries
+ available external resources
-&amp;gt; model decides what to do next
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At that point, context management is no longer answering "how do I save the chat history?" It is answering questions like these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What exactly should the model see on this turn?&lt;/li&gt;
&lt;li&gt;Which information should be visible on every turn?&lt;/li&gt;
&lt;li&gt;Which information should only be fetched on demand?&lt;/li&gt;
&lt;li&gt;Which tool results are already stale?&lt;/li&gt;
&lt;li&gt;Which content is too large and must be trimmed?&lt;/li&gt;
&lt;li&gt;Which parts of history can be summarized?&lt;/li&gt;
&lt;li&gt;How do you preserve continuity after compression?&lt;/li&gt;
&lt;li&gt;How do you isolate context across multiple agents?&lt;/li&gt;
&lt;li&gt;Which internal states must never be exposed to the model?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a systems-design problem, not a prompt-wording problem.&lt;/p&gt;

&lt;p&gt;Without context governance, an agent quickly runs into several classic failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Token Explosion
&lt;/h3&gt;

&lt;p&gt;Tool output keeps piling up and requests get longer and longer.&lt;/p&gt;

&lt;p&gt;One &lt;code&gt;grep&lt;/code&gt; can return dozens of matches. One test run can spill thousands of log lines. One source file can cost thousands of tokens. In long tasks, what fills the window is often not the user's words but the environmental noise coming back from tools.&lt;/p&gt;

&lt;p&gt;Many teams get trapped here because they only count conversation turns and think, "We've only had 20 turns, so the window should still be fine." But each tool call has been dumping more material into the context the whole time.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context Pollution
&lt;/h3&gt;

&lt;p&gt;Old information is still present in context even though the real world has already changed.&lt;/p&gt;

&lt;p&gt;For example, the agent first reads &lt;code&gt;auth.ts&lt;/code&gt;, later edits it, but the old version still sits in history. On the next turn the model may reason carefully from information that is no longer true.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;It looks like deliberate analysis,
but the thing being analyzed is no longer the current code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Constraint Loss
&lt;/h3&gt;

&lt;p&gt;The user says early on, "Don't change the public API," and by turn ten the model has forgotten.&lt;/p&gt;

&lt;p&gt;The project rules say migration files must not be hand-edited, but after compression that rule may not survive into the summary. The task keeps moving, every step still sounds reasonable, and the system has already crossed a boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Compression Amnesia
&lt;/h3&gt;

&lt;p&gt;Compression is not free.&lt;/p&gt;

&lt;p&gt;A weak summary may record which files were read and which code was modified, while losing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the user's actual goal&lt;/li&gt;
&lt;li&gt;where the task is currently stuck&lt;/li&gt;
&lt;li&gt;which approaches have already failed&lt;/li&gt;
&lt;li&gt;which constraints must not be violated&lt;/li&gt;
&lt;li&gt;what should happen next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That leaves the model like someone who has read the meeting minutes but never sat in the room.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-Agent Pollution
&lt;/h3&gt;

&lt;p&gt;The problem becomes even sharper once sub-agents enter the picture.&lt;/p&gt;

&lt;p&gt;A research agent may read a huge amount of material, while the execution agent only needs the final conclusion. If you dump all of the research agent's drafts and dead ends into the executor's context, the downstream agent does not become smarter. It becomes noisier.&lt;/p&gt;

&lt;p&gt;In multi-agent systems, the danger is often not a lack of information. It is every agent carrying someone else's intermediate state forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Context Is Not Prompt, and It Is Not Memory Either
&lt;/h2&gt;

&lt;p&gt;Before going deeper, it helps to separate a few terms that are easy to blur together.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Plain meaning&lt;/th&gt;
&lt;th&gt;The question it answers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt&lt;/td&gt;
&lt;td&gt;Task wording&lt;/td&gt;
&lt;td&gt;How should I ask the model?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;Current workbench&lt;/td&gt;
&lt;td&gt;What does the model actually see on this turn?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Reusable knowledge&lt;/td&gt;
&lt;td&gt;Which facts should survive across tasks?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transcript&lt;/td&gt;
&lt;td&gt;Raw archive&lt;/td&gt;
&lt;td&gt;How do we audit and recover the full process?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Structured task state&lt;/td&gt;
&lt;td&gt;What is the current machine state of the task?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artifact&lt;/td&gt;
&lt;td&gt;External output&lt;/td&gt;
&lt;td&gt;Where do files, logs, diffs, and reports live?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A practical analogy looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt is the assignment sheet.
Context is the material spread across your desk.
Memory is the filing cabinet.
Transcript is the audio/video recording.
State is the project kanban board.
Artifacts are the actual documents and code produced.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many agents become unstable precisely because these layers get mixed together.&lt;/p&gt;

&lt;p&gt;Treat the transcript as context, and every turn explodes in tokens.&lt;/p&gt;

&lt;p&gt;Treat context as memory, and transient noise pollutes long-term recall.&lt;/p&gt;

&lt;p&gt;Treat memory as prompt, and the model misreads "experience" as hard policy.&lt;/p&gt;

&lt;p&gt;Store state only in natural-language history, and long tasks lose it the moment compaction happens.&lt;/p&gt;

&lt;p&gt;So the first principle of context management is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not shove every kind of information into one linear chat history.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A more reliable engineering pattern is to keep different information in different layers, then assemble only the small subset needed for this turn right before each model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Separate the Action Layer from the Architecture Layer
&lt;/h2&gt;

&lt;p&gt;Many context-management discussions start by listing a set of actions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Offload: move large objects out of the prompt
Reduce: trim, extract, summarize
Retrieve: bring information back when needed
Isolate: split work into independent contexts
Cache: reuse stable context or computed results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These actions are useful, but they answer only one question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The context is too large. What operations can I apply?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real engineering has to answer earlier questions first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Should the model even see this piece of information?
Which source outranks which?
Is this hot or cold information right now?
Should it appear as raw text, a summary, a citation, or structured state?
Where should it be recalled from?
How do I compress it without distorting the truth?
Inside which boundary should it apply?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is where the broader seven-dimensional model becomes useful. It upgrades context management from "a list of cleanup operations" into an architectural model.&lt;/p&gt;

&lt;p&gt;The action layer is like a toolbox. The architectural model is like a blueprint.&lt;/p&gt;

&lt;p&gt;A toolbox tells you that you have a hammer, pliers, and a screwdriver. A blueprint tells you where you are allowed to hammer, which layer gets installed first, and how to trace accountability when the system goes wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The Seven-Dimension Model: Turn Context into a Governable Working Set
&lt;/h2&gt;

&lt;p&gt;If context management is treated as a real subsystem, it has to manage information across at least seven dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Visibility:  what the model is allowed to see
Authority:   which source wins when conflicts arise
Temperature: whether the information is hot, warm, cold, frozen, or long-term
Shape:       what form the information takes
Retrieval:   where to recall information from when it is missing
Compression: how to shrink context without losing the truth
Boundary:    how to isolate across tasks, agents, tenants, and permissions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are not parallel buzzwords. Together they form a practical engineering pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Visibility: Decide First Whether the Model Should See It
&lt;/h3&gt;

&lt;p&gt;The first gate is not compression. It is visibility.&lt;/p&gt;

&lt;p&gt;Context usually falls into three broad categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;th&gt;Handling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;llm_visible&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;user goals, project rules, key code snippets, filtered retrieval results&lt;/td&gt;
&lt;td&gt;may enter model-visible context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;runtime_only&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API keys, permission objects, sessions, traces, internal dependencies, database handles&lt;/td&gt;
&lt;td&gt;available to tools and runtime only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;artifact_ref&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;large logs, large files, page snapshots, full diffs&lt;/td&gt;
&lt;td&gt;keep the original outside the prompt and provide a reference plus preview&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One early mistake many agent systems make is confusing "the tool can access it" with "the model should also see it."&lt;/p&gt;

&lt;p&gt;This is why the OpenAI Agents SDK's distinction between local context and LLM context matters so much. Tool functions may need the current user object, the logger, the dependency container, and permission state. The model usually does not.&lt;/p&gt;

&lt;p&gt;One-line rule:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If the model does not need to see it, do not show it to the model. If a reference is enough, do not paste the full original.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Authority: Conflicts Need a Resolution Chain
&lt;/h3&gt;

&lt;p&gt;Conflicts show up in context all the time.&lt;/p&gt;

&lt;p&gt;The user says, "Edit the generated file directly," while the project rules say, "Do not modify generated files." Long-term memory says the user likes Redis, while the current task says not to introduce Redis. An old summary says tests passed, while the newest tool output says they failed.&lt;/p&gt;

&lt;p&gt;If the system has no explicit resolution chain, it is effectively dumping the conflict onto the model and hoping it will improvise correctly.&lt;/p&gt;

&lt;p&gt;A sensible default ordering might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System / safety policy
&amp;gt; Tenant / organization policy
&amp;gt; Project rules
&amp;gt; Current user instruction
&amp;gt; Current task state
&amp;gt; Verified retrieval result
&amp;gt; Long-term memory
&amp;gt; Historical summary
&amp;gt; Raw old conversation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not that every system must use this exact order. The point is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authority has to be designed. It cannot be replaced by adding more "please follow these rules" text to the prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code's system rules, project rules, permission modes, and tool-safety checks are all forms of authority enforced at different layers. In enterprise agents, RBAC, approvals, and audit systems move authority even further out of the prompt and into the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Temperature: Information Needs Hot and Cold Layers
&lt;/h3&gt;

&lt;p&gt;Context is not just "short-term" versus "long-term."&lt;/p&gt;

&lt;p&gt;A more useful breakdown is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hot&lt;/td&gt;
&lt;td&gt;Must be used now; included by default&lt;/td&gt;
&lt;td&gt;current user goal, latest failure log, file currently being edited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm&lt;/td&gt;
&lt;td&gt;Probably relevant; often kept as summary or state&lt;/td&gt;
&lt;td&gt;ruled-out causes, file summaries, active hypotheses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold&lt;/td&gt;
&lt;td&gt;Recalled on demand&lt;/td&gt;
&lt;td&gt;code index, documentation index, historical sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frozen&lt;/td&gt;
&lt;td&gt;Complete raw record; used for audit and recovery&lt;/td&gt;
&lt;td&gt;transcripts, full logs, page snapshots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-term Memory&lt;/td&gt;
&lt;td&gt;Stable facts across sessions&lt;/td&gt;
&lt;td&gt;persistent user preferences, project conventions, long-lived rules&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This makes a context manager look more like a memory manager:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hot items cool into Warm after use.
Stable Warm items may move into long-term memory.
Cold items heat up again when retrieved.
Frozen records stay outside the prompt but preserve the truth.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is also where weak summaries often break down. Many systems compress a hot live scene into a warm summary without preserving the recent tail, so the model loses its feel for the present on the very next turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Shape: The Same Information Can Take Different Forms
&lt;/h3&gt;

&lt;p&gt;Not everything should be represented as natural-language prose.&lt;/p&gt;

&lt;p&gt;The same test failure can exist in many shapes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Shape&lt;/th&gt;
&lt;th&gt;Best used when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw&lt;/td&gt;
&lt;td&gt;you need line-by-line inspection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extract&lt;/td&gt;
&lt;td&gt;you only need command, exit code, error type, and key stack frame&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summary&lt;/td&gt;
&lt;td&gt;you are reviewing older history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Structured State&lt;/td&gt;
&lt;td&gt;you are tracking task status, failed attempts, and next steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reference&lt;/td&gt;
&lt;td&gt;the original is too large, so you keep only an artifact ID or path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diff&lt;/td&gt;
&lt;td&gt;the code change matters more than the full file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph&lt;/td&gt;
&lt;td&gt;the task is really about relationships, dependencies, or DAG structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, a failing test log does not always need to enter the model as a raw blob. It can first be reshaped like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm test auth&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failed&lt;/span&gt;
&lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TypeError user.id should be string&lt;/span&gt;
&lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/auth/session.ts&lt;/span&gt;
&lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redirects after login&lt;/span&gt;
&lt;span class="na"&gt;artifact&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;logs/test-auth-2026-05-03.txt&lt;/span&gt;
&lt;span class="na"&gt;next_step&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inspect mock user construction&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the value of shape:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The same information, represented differently, changes token cost, retrievability, and reliability.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;LangGraph's &lt;code&gt;State&lt;/code&gt;, Claude Code's compact summary, the OpenAI Agents SDK's tool context, and execution context in enterprise systems can all be understood through this lens.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Retrieval: Recall Is Not Just Vector Search
&lt;/h3&gt;

&lt;p&gt;Many people hear "retrieval" and immediately think of vector databases. But agents need far more than one retrieval path.&lt;/p&gt;

&lt;p&gt;A mature system typically has multiple recall routes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Retrieval path&lt;/th&gt;
&lt;th&gt;Best suited for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recent Tail&lt;/td&gt;
&lt;td&gt;recent conversation, current tool results, current state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rule Loading&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, project rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyword Search&lt;/td&gt;
&lt;td&gt;function names, error codes, field names, config keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector / Hybrid Search&lt;/td&gt;
&lt;td&gt;document semantics, similar experiences, complex knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool Search&lt;/td&gt;
&lt;td&gt;progressive loading of tools, skills, and plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artifact Lookup&lt;/td&gt;
&lt;td&gt;large logs, large files, web outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Search&lt;/td&gt;
&lt;td&gt;user preferences, long-term facts, project conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph Traversal&lt;/td&gt;
&lt;td&gt;module dependencies, task DAGs, database relationships&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In code-heavy tasks, symbols, keywords, and paths are often more important than pure semantic retrieval.&lt;/p&gt;

&lt;p&gt;In enterprise knowledge systems, hybrid search, permission filtering, and source credibility matter more.&lt;/p&gt;

&lt;p&gt;In multi-agent systems, artifact lookup and structured handoff matter more.&lt;/p&gt;

&lt;p&gt;So the key retrieval question is not "do you have RAG?" It is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When this kind of task is missing information, what is the most reliable way to recover it?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Compression: Shrink the Working Set, Not the Truth
&lt;/h3&gt;

&lt;p&gt;Compression is not just LLM summarization either.&lt;/p&gt;

&lt;p&gt;It can be broken into several categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Compression method&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Main risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Truncate&lt;/td&gt;
&lt;td&gt;cut directly&lt;/td&gt;
&lt;td&gt;easiest way to lose critical constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extract&lt;/td&gt;
&lt;td&gt;pull out key fields&lt;/td&gt;
&lt;td&gt;incomplete extraction rules can leak important information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summarize&lt;/td&gt;
&lt;td&gt;model-generated summary&lt;/td&gt;
&lt;td&gt;prone to summary drift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Distill&lt;/td&gt;
&lt;td&gt;condense into structured state&lt;/td&gt;
&lt;td&gt;requires schema design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Archive + Ref&lt;/td&gt;
&lt;td&gt;keep the original externally and retain only a reference&lt;/td&gt;
&lt;td&gt;later recovery must be possible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rehydrate&lt;/td&gt;
&lt;td&gt;expand back to the original when needed&lt;/td&gt;
&lt;td&gt;requires traceable provenance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A more reliable order often looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First offload large objects
-&amp;gt; extract key fields
-&amp;gt; distill them into structured state
-&amp;gt; summarize old history
-&amp;gt; truncate only when absolutely necessary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The biggest danger is summary drift: the summary quietly rewrites user constraints, failure causes, or unresolved issues.&lt;/p&gt;

&lt;p&gt;So a compression result should preserve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;source scope&lt;/li&gt;
&lt;li&gt;critical constraints&lt;/li&gt;
&lt;li&gt;failed attempts&lt;/li&gt;
&lt;li&gt;unresolved issues&lt;/li&gt;
&lt;li&gt;artifact references&lt;/li&gt;
&lt;li&gt;next steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why Claude Code-style compaction works best when the summary behaves like a handoff document, not a book report.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Boundary: Isolation Is the Main Thread's Self-Preservation Mechanism
&lt;/h3&gt;

&lt;p&gt;Boundary is the most underrated dimension.&lt;/p&gt;

&lt;p&gt;The value of sub-agents is not just parallelism. It is context isolation.&lt;/p&gt;

&lt;p&gt;These kinds of tasks especially benefit from isolation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large-scale search and research&lt;/li&gt;
&lt;li&gt;long log analysis and debugging&lt;/li&gt;
&lt;li&gt;codebase scanning and web scraping&lt;/li&gt;
&lt;li&gt;data cleaning and independent implementation work&lt;/li&gt;
&lt;li&gt;high-privilege tool calls&lt;/li&gt;
&lt;li&gt;multi-tenant data access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Boundaries can exist at many layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Boundary&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Thread&lt;/td&gt;
&lt;td&gt;isolate context across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task&lt;/td&gt;
&lt;td&gt;isolate state across separate tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subagent&lt;/td&gt;
&lt;td&gt;isolate local context for a sub-task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool&lt;/td&gt;
&lt;td&gt;isolate permissions plus input/output flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Artifact&lt;/td&gt;
&lt;td&gt;externalize large objects so they do not pollute messages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Permission&lt;/td&gt;
&lt;td&gt;require approval for high-risk actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tenant&lt;/td&gt;
&lt;td&gt;isolate across organizations, users, and data domains&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sandbox&lt;/td&gt;
&lt;td&gt;isolate execution environments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A well-scoped sub-agent should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Narrow input: task + constraints + artifact references
Narrow output: conclusion + evidence + suggested next step + confidence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main thread should never receive a full replay of a sub-thread's raw process.&lt;/p&gt;

&lt;p&gt;Without boundary discipline, multi-agent systems easily degrade from "collaboration" into "mutual contamination."&lt;/p&gt;

&lt;h2&gt;
  
  
  5. How Context Grows While an Agent Executes a Task
&lt;/h2&gt;

&lt;p&gt;Go back to the login redirect example.&lt;/p&gt;

&lt;p&gt;At the start, the user only provides one sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Post-login redirect is broken in this project. Help me find the cause and fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the agent genuinely tries to solve that problem, it will generate context like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;New information produced&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inspect directory&lt;/td&gt;
&lt;td&gt;project structure, framework type, entry files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read &lt;code&gt;package.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;test commands, dependencies, scripts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search for &lt;code&gt;login&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;matching files, relevant functions, route paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read route guard&lt;/td&gt;
&lt;td&gt;auth logic, redirect handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read state management&lt;/td&gt;
&lt;td&gt;token, user, session storage strategy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run tests&lt;/td&gt;
&lt;td&gt;failure logs, stack traces, test names&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modify code&lt;/td&gt;
&lt;td&gt;diff, changed files, implementation hypotheses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Run tests again&lt;/td&gt;
&lt;td&gt;new results, new errors, or proof of success&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some of that information is hot. Some cools off quickly.&lt;/p&gt;

&lt;p&gt;The current failure log is hot because the next step depends on it.&lt;/p&gt;

&lt;p&gt;Old search results are warm because they might still be useful, but they do not need to remain verbatim forever.&lt;/p&gt;

&lt;p&gt;The first file read can become cold, or even toxic, once that file has been edited.&lt;/p&gt;

&lt;p&gt;The full transcript still matters, but as a cold archive, not as something to paste into every turn.&lt;/p&gt;

&lt;p&gt;So the context manager's job is not "save everything." It is to keep asking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;At this exact step, which pieces of information matter most for the model to see?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the core of context governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Engineering Problems You Will Actually Hit, and How to Solve Them
&lt;/h2&gt;

&lt;p&gt;Now let's break the engineering side down as symptom -&amp;gt; root cause -&amp;gt; response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1: The Model Doesn't Know the Workspace
&lt;/h3&gt;

&lt;p&gt;Symptom: the agent starts guessing.&lt;/p&gt;

&lt;p&gt;It changes routing logic without reading the routing code. It claims the token was probably never stored without checking the tests. It reorganizes directories according to its own habits without seeing the project rules.&lt;/p&gt;

&lt;p&gt;The problem is not that the model cannot reason. The problem is that the current context does not contain enough on-the-ground information.&lt;/p&gt;

&lt;p&gt;The response is dynamic context injection:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;load project rules and workspace information early&lt;/li&gt;
&lt;li&gt;read task-relevant files on demand&lt;/li&gt;
&lt;li&gt;treat search results as candidates first instead of dumping everything in&lt;/li&gt;
&lt;li&gt;write tool results back into messages or structured state&lt;/li&gt;
&lt;li&gt;retrieve external knowledge through search, web, MCP, or database tools only when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key phrase is "on demand."&lt;/p&gt;

&lt;p&gt;More context is not automatically better. A stable agent is not the one that has seen the most material. It is the one that sees the most relevant material on each turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Tool Results Are Too Large
&lt;/h3&gt;

&lt;p&gt;Symptom: token usage rises fast, the model gets slower, and eventually the context limit hits.&lt;/p&gt;

&lt;p&gt;The root cause is usually not too much user conversation. It is bloated tool output.&lt;/p&gt;

&lt;p&gt;The response is to govern tool results before governing chat history:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;set result budgets for each tool category&lt;/li&gt;
&lt;li&gt;keep only summaries, key stack frames, exit codes, and affected files from long logs&lt;/li&gt;
&lt;li&gt;return snippets, line ranges, symbol indexes, or references instead of entire large files&lt;/li&gt;
&lt;li&gt;return search overviews first, and let the model drill into specific files later&lt;/li&gt;
&lt;li&gt;snip or micro-compact stale tool output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude Code is especially instructive here. In coding agents, context windows often explode because &lt;code&gt;Bash&lt;/code&gt;, &lt;code&gt;Read&lt;/code&gt;, and &lt;code&gt;Grep&lt;/code&gt; bring back too much real-world material, not because the model reasoned too much.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3: Stale Information Pollutes New Decisions
&lt;/h3&gt;

&lt;p&gt;Symptom: the agent keeps reasoning from old code or re-investigates paths that have already been ruled out.&lt;/p&gt;

&lt;p&gt;The root cause is a missing notion of freshness.&lt;/p&gt;

&lt;p&gt;The response is to give context a lifecycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;file reads should carry version, hash, mtime, or read timestamp&lt;/li&gt;
&lt;li&gt;once a file changes, old reads should be down-weighted or marked stale&lt;/li&gt;
&lt;li&gt;test logs should be associated with the command, commit, and time they came from&lt;/li&gt;
&lt;li&gt;search results should be treated as clues, not truth&lt;/li&gt;
&lt;li&gt;key facts should cite sources whenever possible instead of living only in prose summaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why a context manager should ideally handle information as metadata-rich objects, not just a bag of strings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 4: Rules Conflict with Each Other
&lt;/h3&gt;

&lt;p&gt;Symptom: system rules, project rules, the current user instruction, and long-term memory collide.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The system says secrets must never leak.
Project rules say generated files must not be edited.
The current user request says to edit a generated file directly.
Long-term memory says the user prefers speed over ceremony.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If all of that is merely dumped into natural-language context, the model may resolve the conflict inconsistently or for the wrong reason.&lt;/p&gt;

&lt;p&gt;The response is an explicit authority hierarchy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System / security policy
-&amp;gt; organization-level rules
-&amp;gt; project-level rules
-&amp;gt; current user instruction
-&amp;gt; long-term preferences
-&amp;gt; retrieval and tool results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, rules often split into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hard constraints: the system must intercept or require approval&lt;/li&gt;
&lt;li&gt;soft constraints: inject into context for the model's guidance&lt;/li&gt;
&lt;li&gt;situational constraints: inject only when a path, tool, or task matches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why an extremely long &lt;code&gt;AGENTS.md&lt;/code&gt; or &lt;code&gt;CLAUDE.md&lt;/code&gt; is not automatically better. Rules that are too long, too broad, and too conflicting eventually become context noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 5: The Task Loses the Thread After Compression
&lt;/h3&gt;

&lt;p&gt;Symptom: after compaction, the model knows roughly what happened but not what it should do next.&lt;/p&gt;

&lt;p&gt;The root cause is that the summary records history but not state.&lt;/p&gt;

&lt;p&gt;A good compressed summary is not an essay abstract. It is a task handoff.&lt;/p&gt;

&lt;p&gt;At minimum, it should preserve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the user's goal
inviolable constraints
the current phase
important files already read
files already modified
key judgments and evidence
failed attempts
latest test or verification results
recommended next step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And ideally it should also preserve the most recent few raw turns plus key tool outputs.&lt;/p&gt;

&lt;p&gt;In other words:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The summary preserves the main thread.
The recent tail preserves the live feel of the scene.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is much more stable than flattening all old history into one paragraph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 6: Multiple Agents Contaminate Each Other
&lt;/h3&gt;

&lt;p&gt;Symptom: one sub-agent's draft, assumptions, or dead ends distort another sub-agent's work.&lt;/p&gt;

&lt;p&gt;The root cause is a shared linear chat history.&lt;/p&gt;

&lt;p&gt;The response is context isolation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a sub-agent receives a local task, not the full global history&lt;/li&gt;
&lt;li&gt;a sub-agent returns a structured result, not a complete thought dump&lt;/li&gt;
&lt;li&gt;upstream passes forward verifiable artifacts, references, and conclusions&lt;/li&gt;
&lt;li&gt;shared state is managed with schemas, not casual paraphrase&lt;/li&gt;
&lt;li&gt;each agent gets its own tool permissions and context budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In complex work, isolation often matters more than collaboration.&lt;/p&gt;

&lt;p&gt;Without isolation, multi-agent work quickly turns into multiple agents polluting the same working surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 7: Cost and Latency Spiral Out of Control
&lt;/h3&gt;

&lt;p&gt;Symptom: the agent can work, but every step becomes slow, expensive, and verbose.&lt;/p&gt;

&lt;p&gt;The root cause is that each turn carries too much fixed content, or keeps re-searching, re-reading, and re-explaining from scratch.&lt;/p&gt;

&lt;p&gt;Useful responses include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt caching for stable system prompts and tool descriptions&lt;/li&gt;
&lt;li&gt;lazy loading for detailed tool docs, rule files, and long documents&lt;/li&gt;
&lt;li&gt;progressive disclosure: summary or index first, full content only when needed&lt;/li&gt;
&lt;li&gt;local context for runtime dependencies and internal state that the model does not need&lt;/li&gt;
&lt;li&gt;structured state for machine-processable information that should not become natural-language tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Large context windows solve the capacity problem. They do not solve the information-discipline problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No matter how large the window gets, if you stuff every turn with irrelevant material, the model will still be slow, expensive, and prone to drift.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. How Different Projects Handle Context
&lt;/h2&gt;

&lt;p&gt;Now put several representative systems on the same canvas.&lt;/p&gt;

&lt;p&gt;The point is not to decide which one is "more advanced." The point is to see how radically the pressure on context management changes across different host environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Claude Code: Context Defense Lines for a Long-Task CLI Agent
&lt;/h3&gt;

&lt;p&gt;Claude Code's typical environment is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inside a real repository, continuously reading files, editing code, running commands, and fixing bugs.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its most visible context pressures are tool results and long-task history.&lt;/p&gt;

&lt;p&gt;So its context priorities are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;inject project context through &lt;code&gt;CLAUDE.md&lt;/code&gt;, rule files, and path scoping&lt;/li&gt;
&lt;li&gt;keep large files, logs, and search results from flooding the message stream&lt;/li&gt;
&lt;li&gt;compact history into summaries near context limits so the task can continue&lt;/li&gt;
&lt;li&gt;preserve transcript, resume state, and recent tail for continuity&lt;/li&gt;
&lt;li&gt;isolate search, analysis, and implementation into sub-agents when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The big lesson from Claude Code is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a coding agent, context management is not primarily about long-term memory. It is about keeping the tool loop alive.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. LangGraph: Move Context Out of Chat History and into Structured State
&lt;/h3&gt;

&lt;p&gt;LangGraph looks at the problem from a different angle.&lt;/p&gt;

&lt;p&gt;It does not primarily treat an agent as a running conversation. It treats it as a graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;node executes
-&amp;gt; state updates
-&amp;gt; checkpoint
-&amp;gt; next node continues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its context priorities are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;state schema&lt;/li&gt;
&lt;li&gt;checkpoints&lt;/li&gt;
&lt;li&gt;thread-level state history&lt;/li&gt;
&lt;li&gt;time travel for debugging and branching&lt;/li&gt;
&lt;li&gt;fault tolerance and recovery from the last valid checkpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson here is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not force chat history to carry all of the task state.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If a task has explicit steps, nodes, and intermediate state, a state graph can be much more reliable than keeping everything in natural-language dialogue.&lt;/p&gt;

&lt;p&gt;Claude Code starts by governing messages and tool results. LangGraph starts by governing state and execution boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. OpenAI Agents SDK: Separate Local Context from LLM Context
&lt;/h3&gt;

&lt;p&gt;One of the most important distinctions in the OpenAI Agents SDK is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Local context: context visible to your code and tools at runtime.
LLM context: context visible to the model during generation.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is an extremely engineering-oriented distinction.&lt;/p&gt;

&lt;p&gt;Many developers think of "context" as simply "whatever gets sent to the model." But in real applications, some information is necessary for tools while remaining unnecessary, or inappropriate, for the model itself.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database connections&lt;/li&gt;
&lt;li&gt;loggers&lt;/li&gt;
&lt;li&gt;current user objects&lt;/li&gt;
&lt;li&gt;permission state&lt;/li&gt;
&lt;li&gt;internal dependencies&lt;/li&gt;
&lt;li&gt;tool-call metadata&lt;/li&gt;
&lt;li&gt;usage statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These belong in runtime-local structures, not necessarily in model-visible context.&lt;/p&gt;

&lt;p&gt;The lesson is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first step of context management is distinguishing what the runtime needs from what the model needs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That separation prevents both accidental leakage and wasted tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. AutoGen: Model Context and Memory Injection in Multi-Agent Systems
&lt;/h3&gt;

&lt;p&gt;AutoGen's typical environment is multi-agent conversation and collaboration.&lt;/p&gt;

&lt;p&gt;Its pressure is not just whether one model forgets. It is how multiple agents share information, separate roles, and control message history.&lt;/p&gt;

&lt;p&gt;Its main context concerns include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which messages each agent sees when calling the model&lt;/li&gt;
&lt;li&gt;how memory gets queried and injected&lt;/li&gt;
&lt;li&gt;how roles partition visible information&lt;/li&gt;
&lt;li&gt;how team orchestration controls message flow and termination&lt;/li&gt;
&lt;li&gt;when to keep full history versus only a window or head-and-tail view&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The lesson from AutoGen is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In multi-agent systems, context management is first and foremost boundary management.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A reviewer should not inherit every tool permission from the executor.&lt;/p&gt;

&lt;p&gt;A researcher should not dump every search draft into the writer's context.&lt;/p&gt;

&lt;p&gt;A planner's intermediate assumptions should not automatically become global facts.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Cursor / Copilot: IDE Assistants Prioritize Local Relevance and Low Latency
&lt;/h3&gt;

&lt;p&gt;IDE assistants live in a very different environment.&lt;/p&gt;

&lt;p&gt;They often need to autocomplete, explain, or rewrite code while the user is typing. The core pressure is not long-task recovery. It is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Find the most useful code context near the cursor as quickly as possible.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So their context priorities skew toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;snippets around the cursor&lt;/li&gt;
&lt;li&gt;symbols in the current file&lt;/li&gt;
&lt;li&gt;imports and type information&lt;/li&gt;
&lt;li&gt;similar code blocks&lt;/li&gt;
&lt;li&gt;recently edited files&lt;/li&gt;
&lt;li&gt;semantic or incremental indexing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They do not always need full-project comprehension.&lt;/p&gt;

&lt;p&gt;The lesson is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context management should serve the scenario, not chase completeness by default.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Hermes / OpenClaw / Enterprise Harnesses: Long-Running Runtime and Governance Context
&lt;/h3&gt;

&lt;p&gt;One level up, context management expands from task execution into runtime governance.&lt;/p&gt;

&lt;p&gt;OpenClaw is closer to an agent control plane and entry point. It cares about how messaging channels, automation tasks, device nodes, browsers, and local capabilities connect into one session system.&lt;/p&gt;

&lt;p&gt;Hermes is closer to a self-improving runtime. It cares about long-term memory, user profiles, skill accumulation, cross-session recall, and reusable experience.&lt;/p&gt;

&lt;p&gt;Enterprise harnesses care about pipeline context, secrets, connectors, RBAC, approvals, and audit, where the agent has to operate inside existing process controls rather than outside them.&lt;/p&gt;

&lt;p&gt;What these systems share is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context is no longer just model input. It becomes part of the whole runtime environment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this level, context governance also has to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who triggered the task&lt;/li&gt;
&lt;li&gt;which channel it came from&lt;/li&gt;
&lt;li&gt;whose machine or sandbox is executing it&lt;/li&gt;
&lt;li&gt;which secrets are available&lt;/li&gt;
&lt;li&gt;which approvals have already passed&lt;/li&gt;
&lt;li&gt;which past experience is reusable&lt;/li&gt;
&lt;li&gt;which operations must be auditable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the end state of context management is not simply "better prompting." It becomes part of the agent harness itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Put Them Side by Side
&lt;/h2&gt;

&lt;p&gt;We can place these systems onto a shared comparison grid:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project / System&lt;/th&gt;
&lt;th&gt;Primary scenario&lt;/th&gt;
&lt;th&gt;Core of context management&lt;/th&gt;
&lt;th&gt;Main problem solved&lt;/th&gt;
&lt;th&gt;Easily overlooked edge&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;CLI coding agent&lt;/td&gt;
&lt;td&gt;project rules, tool results, compression, recovery&lt;/td&gt;
&lt;td&gt;keep long tasks coherent without tool output blowing up the window&lt;/td&gt;
&lt;td&gt;compressed summaries can still lose local in-situ detail&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph&lt;/td&gt;
&lt;td&gt;graph-based workflow agent&lt;/td&gt;
&lt;td&gt;state, checkpoints, threads, time travel&lt;/td&gt;
&lt;td&gt;recoverable state and debuggable workflow nodes&lt;/td&gt;
&lt;td&gt;model input still needs separate governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Agents SDK&lt;/td&gt;
&lt;td&gt;application-style agent SDK&lt;/td&gt;
&lt;td&gt;separation of local context and LLM context&lt;/td&gt;
&lt;td&gt;layered handling of runtime dependencies and model-visible information&lt;/td&gt;
&lt;td&gt;developers still have to design injection policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;multi-agent collaboration&lt;/td&gt;
&lt;td&gt;model context, memory, role boundaries&lt;/td&gt;
&lt;td&gt;multi-role message flow and memory augmentation&lt;/td&gt;
&lt;td&gt;too much shared history causes contamination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor / Copilot&lt;/td&gt;
&lt;td&gt;IDE real-time assistant&lt;/td&gt;
&lt;td&gt;cursor-local context, similar code, indexing&lt;/td&gt;
&lt;td&gt;low-latency local relevance&lt;/td&gt;
&lt;td&gt;not ideal for carrying long-task state by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes / OpenClaw&lt;/td&gt;
&lt;td&gt;personal long-running runtime&lt;/td&gt;
&lt;td&gt;gateway, memory, skills, session search&lt;/td&gt;
&lt;td&gt;multi-entry operation and long-term experience reuse&lt;/td&gt;
&lt;td&gt;long-term memory must resist staleness and contamination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise harnesses&lt;/td&gt;
&lt;td&gt;workflow and governance agent&lt;/td&gt;
&lt;td&gt;pipeline context, secrets, RBAC, audit&lt;/td&gt;
&lt;td&gt;place agents inside governable enterprise processes&lt;/td&gt;
&lt;td&gt;process boundaries constrain flexibility&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The main point of the table is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These projects are not just giving different answers to the same exam question. They are handling different context pressures in different environments.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude Code struggles most with long tasks and tool output.&lt;/p&gt;

&lt;p&gt;LangGraph struggles most with recoverable state.&lt;/p&gt;

&lt;p&gt;OpenAI Agents SDK struggles most with the boundary between runtime state and model-visible state.&lt;/p&gt;

&lt;p&gt;AutoGen struggles most with multi-agent coordination.&lt;/p&gt;

&lt;p&gt;Cursor and Copilot struggle most with low-latency code relevance.&lt;/p&gt;

&lt;p&gt;Hermes and OpenClaw struggle most with long-lived runtime continuity.&lt;/p&gt;

&lt;p&gt;Enterprise harnesses struggle most with permissions, audit, and process embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Building a Minimal Context Manager Yourself
&lt;/h2&gt;

&lt;p&gt;If you are implementing a small agent from scratch, do not start with a giant vector database or a complex multi-layer memory design.&lt;/p&gt;

&lt;p&gt;A more stable path is to split the context manager into explicit components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visibility Filter&lt;/td&gt;
&lt;td&gt;decide what may enter model context and what must remain runtime-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authority Resolver&lt;/td&gt;
&lt;td&gt;resolve conflicts and priority&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temperature Manager&lt;/td&gt;
&lt;td&gt;manage hot / warm / cold / frozen / long-term layers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval Router&lt;/td&gt;
&lt;td&gt;choose whether to recall from rules, keywords, vectors, tools, artifacts, memory, or graphs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression Engine&lt;/td&gt;
&lt;td&gt;handle offloading, extraction, summarization, structuring, and rehydration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boundary Controller&lt;/td&gt;
&lt;td&gt;manage thread, task, subagent, tenant, permission, and sandbox boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Budgeter&lt;/td&gt;
&lt;td&gt;manage token budget, selection reasoning, and the resulting context plan&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Once split that way, a context manager is no longer "the code that assembles a prompt." It becomes a debuggable working-set planner.&lt;/p&gt;

&lt;p&gt;An MVP loop can be quite simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Preserve all messages and tool results in the transcript.
2. Before each model request, collect candidate context from transcript, state, memory, and tools.
3. Tag each candidate with source, kind, temperature, authority, token estimate, and visibility.
4. Select the most relevant subset for the current task.
5. Trim or summarize large tool outputs.
6. Keep the most recent N raw turns.
7. Compress older history into a task handoff summary.
8. Force the summary to retain: goal, constraints, completed work, failed work, and next step.
9. Keep the pre-compression original in the transcript for recovery and audit.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal data structure might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ContextItem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;instruction&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user_goal&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_result&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;memory&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;state&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="na"&gt;visibility&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;llm_visible&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;runtime_only&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;artifact_ref&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;authority&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;org&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;project&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;task_state&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;retrieval&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;memory&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;warm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cold&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;frozen&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;long_term&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;raw&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;extract&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;reference&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;diff&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;structured&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;graph&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;boundary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;thread&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;task&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;subagent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tenant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sandbox&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="na"&gt;tokenEstimate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;freshnessTs&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;conflictKey&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And each turn can produce a &lt;code&gt;ContextPlan&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ContextPlan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ContextItem&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="na"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;extract&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summarize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;distill&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;archive_ref&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;dropped&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;conflicts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;winner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;losers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="nl"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
    &lt;span class="na"&gt;used&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
    &lt;span class="na"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The value of &lt;code&gt;ContextPlan&lt;/code&gt; is explainability.&lt;/p&gt;

&lt;p&gt;When the agent makes a mistake, you can ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What context was actually selected this turn?
Which rules were dropped?
Which tool result was compressed?
Why was long-term memory injected?
Why did a user constraint fail to make it into the prompt?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without a plan like that, context behavior stays a black box.&lt;/p&gt;

&lt;p&gt;One useful mental model for the per-turn build process is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;collect candidates
-&amp;gt; remove runtime-only items
-&amp;gt; resolve authority conflicts
-&amp;gt; drop stale tool results
-&amp;gt; prefer hot context
-&amp;gt; compress large items
-&amp;gt; preserve recent tail
-&amp;gt; inject final context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;visible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;applyVisibilityFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resolved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolveAuthorityConflicts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;visible&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fresh&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;updateTemperatureAndFreshness&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;routeRetrievalIfNeeded&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fresh&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;shaped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;transformShape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compressToBudget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;shaped&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;budget&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;selected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;enforceBoundaries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;compressed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nf"&gt;stableInstructions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;projectRules&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;taskSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;recentTail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transcript&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;toolResults&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selected&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;currentUserInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key point is not the exact code. It is the mindset:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context should be built deliberately. It should not just grow by accident.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You also should not budget only by total token count. Bucketed budgeting is usually more stable:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Budget bucket&lt;/th&gt;
&lt;th&gt;Suggested share&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System / policy / project rules&lt;/td&gt;
&lt;td&gt;10%-20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current user input + task state&lt;/td&gt;
&lt;td&gt;10%-20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recent tail&lt;/td&gt;
&lt;td&gt;15%-25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieved context&lt;/td&gt;
&lt;td&gt;20%-35%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool results / artifact preview&lt;/td&gt;
&lt;td&gt;10%-20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-term memory&lt;/td&gt;
&lt;td&gt;5%-10%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you exceed budget, do not immediately chop the recent tail first.&lt;/p&gt;

&lt;p&gt;A safer order is often:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drop low-confidence retrieval first
-&amp;gt; drop expired memory
-&amp;gt; convert oversized tool results into artifact references
-&amp;gt; compress older history
-&amp;gt; shrink the recent tail only at the end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recent tail often carries the system's sense of "where we are right now." Cut it too early and the model loses proximity to the live scene.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. How to Write a Good Compression Summary
&lt;/h2&gt;

&lt;p&gt;Many systems have unstable compaction because they aim the summary at the wrong target.&lt;/p&gt;

&lt;p&gt;They write a recap of the past instead of a handoff for the next turn.&lt;/p&gt;

&lt;p&gt;For agents, a better compact template looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;User Goal:
[What the user originally wanted]

Hard Constraints:
[Rules that must not be violated, explicit user requirements, permission boundaries]

Current State:
[Where the task is actually stuck right now, not a vague recap]

Key Facts:
[Facts confirmed from files, logs, or tool results, ideally with sources]

Files Read:
[Path + key takeaways + whether the content may now be stale]

Files Modified:
[Path + what changed + why]

Approaches Tried But Failed:
[So the next turn does not repeat the same mistakes]

Latest Verification Results:
[Command, result, failure message, or proof of success]

Next Step:
[What should happen first after decompression]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point of this template is resumability.&lt;/p&gt;

&lt;p&gt;History alone is not enough. The agent must know where to pick the task up again.&lt;/p&gt;

&lt;h2&gt;
  
  
  11. What Questions Should Drive Your Architecture Choice?
&lt;/h2&gt;

&lt;p&gt;If you are designing your own agent system, do not start with "which framework is strongest?"&lt;/p&gt;

&lt;p&gt;Start with questions like these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is my agent for low-latency completion or long-running task execution?
Can the task state be structured?
Will tool results be very large?
Do I need cross-session memory?
Is there multi-agent collaboration?
Do I need enterprise permissions and audit?
Should the model be allowed to see internal runtime state?
Do failures need to be recoverable?
What is the one thing I can least afford to lose after compression?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Different answers lead to different design priorities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IDE completion systems should prioritize local code context and low-latency indexing.&lt;/li&gt;
&lt;li&gt;Workflow systems should prioritize state, checkpoints, and resumable execution.&lt;/li&gt;
&lt;li&gt;Application SDKs should prioritize separating local context from model-visible context.&lt;/li&gt;
&lt;li&gt;Coding CLI agents should prioritize tool-result governance, compaction, and recent-tail continuity.&lt;/li&gt;
&lt;li&gt;Multi-agent systems should prioritize boundaries, roles, handoffs, and structured artifacts.&lt;/li&gt;
&lt;li&gt;Long-lived personal assistants should prioritize layered memory, skill accumulation, and expiration policy.&lt;/li&gt;
&lt;li&gt;Enterprise systems should build permissions, approvals, secrets, and audit directly into the context architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gets much closer to engineering reality than comparing models in the abstract.&lt;/p&gt;

&lt;h2&gt;
  
  
  12. One-Sentence Summary
&lt;/h2&gt;

&lt;p&gt;If you compress this whole chapter into one sentence, it becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context management is not about stuffing more content into the model. It is about continuously deciding, within a finite window, what the model should see, in what form, at what time, how to compress it when space runs out, and how to recover when the task is interrupted.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Compressed even further, it becomes six verbs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Select
Inject
Recall
Compress
Isolate
Recover
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code, LangGraph, OpenAI Agents SDK, AutoGen, Cursor, Hermes, and OpenClaw all look very different on the surface. But underneath, they are all answering the same question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When the model has no real memory,
and the task still has to move forward continuously,
how does the outer system manage the world the model gets to see on this turn?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the real value of context management.&lt;/p&gt;

&lt;p&gt;It is not a side feature of an agent. It is one of the core capabilities of the agent harness.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Claude Code Source Analysis Series, Chapter 4: Context Management</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 08:05:41 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-4-context-management-blm</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-4-context-management-blm</guid>
      <description>&lt;h1&gt;
  
  
  Chapter 4 of the &lt;em&gt;Claude Code Source Analysis Series&lt;/em&gt; | Context Management
&lt;/h1&gt;

&lt;p&gt;In the previous article, we looked at Claude Code's prompt runtime: on every turn, the outer system rebuilds the model request from system rules, project memory, dynamic context, tool descriptions, message history, and prior tool results.&lt;/p&gt;

&lt;p&gt;This chapter asks the next question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Once all of that keeps accumulating, how does Claude Code decide what to keep, what to compress, and what to leave out?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;People often reduce context management to a single idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Just stuff more history into the model.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is only half true.&lt;/p&gt;

&lt;p&gt;In a normal chat app, context does look a lot like chat history. But Claude Code is a coding agent, so its context is closer to a dynamic workbench. What the model sees on any given turn is not just "what the user said before." It may include system rules, project conventions, tool descriptions, file reads, shell output, error logs, the file that was just edited, task progress, and compressed summaries of earlier work.&lt;/p&gt;

&lt;p&gt;So the real question is not whether context should exist. The real question is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does an agent that keeps reading files, running commands, and editing code decide what the model should see on this turn? How does it preserve older information? How does it compress when space gets tight?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the previous article was about assembling an operating manual for the model, this one is about something even more fundamental:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model's workbench has limited space.
Claude Code has to keep reorganizing that workbench while the task is still in motion.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will keep using the same running example as the rest of this series:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The user says: the tests in this project are failing. Find the cause and fix them.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds short. For Claude Code, it quickly unfolds into a much longer chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inspect the project structure
-&amp;gt; Read package.json
-&amp;gt; Run the test command
-&amp;gt; Analyze the failure
-&amp;gt; Search the relevant code
-&amp;gt; Read the target file
-&amp;gt; Edit the code
-&amp;gt; Run the tests again
-&amp;gt; Summarize the result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step creates more context. Context management exists to make sure that, over a long task, this information does not break continuity and does not drown the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Context Is Not Just Text History. It Is a Workbench Rebuilt Every Turn
&lt;/h2&gt;

&lt;p&gt;The most important fact to carry over from the previous chapter is that the model itself is stateless from one call to the next.&lt;/p&gt;

&lt;p&gt;Claude Code only appears continuous because the outer harness rebuilds the right working scene on every turn and sends that reconstructed scene back to the model.&lt;/p&gt;

&lt;p&gt;So a model request is not really:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user question -&amp;gt; model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is much closer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;system rules
+ project rules
+ user preferences
+ current tool descriptions
+ message history
+ tool results
+ compressed summaries
+ current user input
=&amp;gt; this turn's model request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is why context management should not be understood as "saving the chat log." A better name would be:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;context orchestration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It has to answer a series of concrete questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which information must survive every turn?&lt;/li&gt;
&lt;li&gt;Which information should stay in the runtime and never be exposed to the model?&lt;/li&gt;
&lt;li&gt;Which pieces can be cached?&lt;/li&gt;
&lt;li&gt;Which pieces should only be fetched again on demand?&lt;/li&gt;
&lt;li&gt;Which old tool results are already stale?&lt;/li&gt;
&lt;li&gt;Which parts of history must be compressed into summaries?&lt;/li&gt;
&lt;li&gt;After compression, how does the model still know what it is currently doing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this layer, the ReAct-style main loop quickly runs into two opposite failures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Too little context: the model loses track of what already happened.
Too much context: token usage, cost, latency, and attention all spiral out of control.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Context management is the balancing act between those two failures.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl8hv4z0n4llrz50ecx2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcl8hv4z0n4llrz50ecx2.png" alt="04.1 core mechanism - Context management sketch 1: the workbench rebuilt on every turn" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Why Does a Coding Agent's Token Usage Spike So Quickly?
&lt;/h2&gt;

&lt;p&gt;A normal chat turn might cost a few hundred tokens.&lt;/p&gt;

&lt;p&gt;A coding agent is different. Every action it takes pulls real environment output back into the conversation. A 500-line source file can cost thousands of tokens. One failed test can return a long stack trace. A global search can produce dozens of matches.&lt;/p&gt;

&lt;p&gt;Worse, you usually cannot discard that information immediately. On the next turn, the model still needs to know things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which file was just read?
Which line triggered the error?
What fixes have already been tried?
Which command failed?
Did the user warn us not to touch a certain kind of file?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So an agent's context does not grow like "a few more messages." Tool calls keep injecting environment state into the message stream.&lt;/p&gt;

&lt;p&gt;That creates three classic failure modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Token Explosion
&lt;/h3&gt;

&lt;p&gt;Tool outputs pile up until the next request exceeds the model's context window. At that point this is not just lower answer quality. The run can stall outright.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context Pollution
&lt;/h3&gt;

&lt;p&gt;Old file contents, outdated command output, and stale error logs remain in history. The model may treat obsolete information as current truth. A file has already been changed, but the context still contains the old version, so the model keeps reasoning from stale code.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Compression Amnesia
&lt;/h3&gt;

&lt;p&gt;Compress too aggressively, and the model forgets the user's original goal, the current stage of the task, or something that just happened a moment ago. This is the most frustrating failure mode: the system still looks active, but its direction has quietly drifted off course.&lt;/p&gt;

&lt;p&gt;That is why Claude Code's context management is not "summarize when full." It is continuous capacity governance running inside the main loop.&lt;/p&gt;

&lt;p&gt;In engineering terms, this behaves more like a resident GC worker than a full GC that only runs after the heap is exhausted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7a46wyjqmihviuydu28.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg7a46wyjqmihviuydu28.png" alt="04.1 core mechanism - Context management sketch 2: three failure modes caused by token growth" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Put Context Back Inside the QueryEngine Main Loop
&lt;/h2&gt;

&lt;p&gt;Claude Code's main execution loop is not just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;request model
-&amp;gt; call tool
-&amp;gt; request model again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each turn actually passes through a context-governance layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prefetch project and session information
-&amp;gt; assemble this turn's context
-&amp;gt; request the model
-&amp;gt; decide whether the model is answering or asking for a tool
-&amp;gt; execute the tool and write the result back into message history
-&amp;gt; check token pressure
-&amp;gt; trim, collapse, or compress when needed
-&amp;gt; carry the new state into the next turn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Visually, it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzodil2x1a6fddm6jfj6d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzodil2x1a6fddm6jfj6d.png" alt="04.1 core mechanism - Context management figure 1" width="800" height="1826"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most important segment is &lt;code&gt;H -&amp;gt; I -&amp;gt; J -&amp;gt; B&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Tool results are not merely UI logs. They are raw material for the next round of reasoning. Every file read, shell command, and code search writes a result back into message history, and that history is then re-evaluated to decide what can still fit into the next model request.&lt;/p&gt;

&lt;p&gt;Context management is not a helper living off to the side of &lt;code&gt;QueryEngine&lt;/code&gt;. It is part of what makes the loop runnable at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Governance Comes Before Compression
&lt;/h2&gt;

&lt;p&gt;When people hear "context management," they often jump straight to compression. But a mature agent cannot just compress.&lt;/p&gt;

&lt;p&gt;It first has to answer at least four classes of information-governance questions.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;visibility&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Should this information go to the model, or should it stay inside the runtime?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API keys, permission objects, and internal traces should not enter the prompt. Large files and giant logs do not always need to be passed through verbatim either. Sometimes a reference or a summary is enough.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;authority&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If system rules, project rules, user instructions, and long-term memory conflict, which one wins?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the project rules say "do not edit generated files" but the user asks for exactly that, the system cannot leave the decision to the model's intuition alone.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;hot / warm / cold tiering&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is hot context and needed right now?
What is warm context and might be needed soon?
What should stay outside the prompt until it is recalled on demand?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The log from the currently failing test is hot. An old error from two hours ago that has already been resolved is warm. The full transcript is cold. You cannot push all of it into every turn.&lt;/p&gt;

&lt;p&gt;Fourth, &lt;strong&gt;shape transformation&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Should this information exist as raw text, a summary, structured state, a diff, or a reference?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A failing test log can remain in raw form, or it can be normalized into something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Command: pnpm test auth
Status: failed
Key error: TypeError: user.id should be string
Relevant file: src/auth/session.ts
Next step: inspect the mock user construction logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those two forms consume very different numbers of tokens, and they help the model in different ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context management is information governance. Compression is only one action inside that broader system.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Claude Code's Compression Is a Layered Defense, Not a Blunt Instrument
&lt;/h2&gt;

&lt;p&gt;If you read the source, Claude Code's compression pipeline is easiest to understand as a staged defense that escalates from light to heavy.&lt;/p&gt;

&lt;p&gt;It does not begin by turning all old history into one paragraph. It starts with low-risk, low-loss local cleanup. If that is not enough, it escalates toward folded views and only later toward full summarization.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsjycg6kdt6tapum0kh7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsjycg6kdt6tapum0kh7.png" alt="04.1 core mechanism - Context management sketch 3: escalating compression defenses from light to heavy" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxlz8ovjwbvilq6wxecs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhxlz8ovjwbvilq6wxecs.png" alt="04.1 core mechanism - Context management figure 2" width="800" height="2391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The philosophy is simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If local slimming is enough, do not jump to global summarization. If structure can be preserved, do not collapse everything into a paragraph. Lossy folding should be the last resort.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let's look at the layers one by one.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool Result Budget: Cap the loudest noise source first
&lt;/h3&gt;

&lt;p&gt;The first thing that usually needs control is not the user's message. It is the tool output.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Bash&lt;/code&gt; can return thousands of log lines&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Read&lt;/code&gt; can return a large file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Grep&lt;/code&gt; can return dozens of matching blocks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WebFetch&lt;/code&gt; can pull in a full webpage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those are passed into the next turn unchanged, the window fills quickly. The role of &lt;code&gt;applyToolResultBudget&lt;/code&gt; is to cap oversized individual tool results before heavier compression starts.&lt;/p&gt;

&lt;p&gt;In one sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not let one tool result consume the entire workbench.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Snip: Remove low-value bulk without breaking the structure
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;snip&lt;/code&gt; works like local surgery.&lt;/p&gt;

&lt;p&gt;It does not remove entire turns. Instead, it replaces large low-value blocks with markers or shorter representations while preserving the structure of the message chain.&lt;/p&gt;

&lt;p&gt;Why not simply delete them? Because message history carries tool call IDs, &lt;code&gt;tool_result&lt;/code&gt; pairings, and cross-turn references. Deleting a message outright can break continuity. Replacing it with a marker frees space while preserving the fact that "a tool result used to be here."&lt;/p&gt;

&lt;p&gt;In one sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The content gets shorter, but the ledger stays intact.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. MicroCompact: Clean stale tool results without destroying the task's structure
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;MicroCompact&lt;/code&gt; is a more systematic local cleanup pass.&lt;/p&gt;

&lt;p&gt;It mainly targets tool outputs that are large, time-sensitive, and already superseded by later work, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;old file read results
old search results
old command output
old webpage or external-query results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It usually leaves these alone:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;original user messages
key assistant responses
recent tool results
currently active context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, suppose the agent reads &lt;code&gt;src/auth/session.ts&lt;/code&gt;, later edits that file, and then reads the new version. The first read is now stale. Keeping it in full wastes space and can also mislead the model.&lt;/p&gt;

&lt;p&gt;In one sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Take out the trash, but keep the ledger.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Context Collapse: Fold the view before you rush to summarize
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;Context Collapse&lt;/code&gt; is a smarter intermediate layer.&lt;/p&gt;

&lt;p&gt;The goal is not just to delete history. It is to project a more compact view of the context. If that folded view drops the request back below the safety threshold, there is no need to trigger the more expensive &lt;code&gt;AutoCompact&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This reflects an important engineering tradeoff in Claude Code:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If fine-grained structure can be preserved, do not rush to turn the whole history into one large summary.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full summarization saves space, but it always loses detail. Collapse is more like grouping, folding, and stowing documents on a desk, not burning them all and keeping only a meeting note.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. AutoCompact: At the end of the line, turn history into a handoff note
&lt;/h3&gt;

&lt;p&gt;Only after the lighter local defenses fail does automatic summary compression begin.&lt;/p&gt;

&lt;p&gt;But the summary cannot be something vague like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We discussed the failing tests, read some files, and changed some code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is useless if the task has to continue.&lt;/p&gt;

&lt;p&gt;A good compact summary should read like a &lt;strong&gt;task handoff note&lt;/strong&gt;, preserving at least:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;the user's main request
key constraints
files touched
important facts discovered
errors encountered
fixes already attempted
what is currently in progress
what should happen next
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last two are especially important:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;what is currently in progress
what should happen next
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Many weak summaries record what happened but not where the task currently stands. After compression, the model remembers the rough story but does not know where to resume.&lt;/p&gt;

&lt;p&gt;So the essence of &lt;code&gt;AutoCompact&lt;/code&gt; is not "write a summary."&lt;/p&gt;

&lt;p&gt;It is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turn the conversation into a handoff note that the next turn can keep executing from.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Reactive Compact: The recovery path after the model says it is full
&lt;/h3&gt;

&lt;p&gt;Even with proactive budgeting and automatic compression, reality can still surprise the system.&lt;/p&gt;

&lt;p&gt;The model API may return a context-too-large error. Media may exceed limits. Token estimates may not match the actual encoding exactly. That is where reactive compaction enters. It is not proactive prevention. It is recovery after a budget failure.&lt;/p&gt;

&lt;p&gt;Its existence points to one core lesson:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A long-running agent cannot assume its budget estimate is always perfect. It needs a recovery path for when the estimate is wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the same engineering instinct you see in retry and recovery logic elsewhere: do not assume the system will never fail; make sure it can recover when it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Why Keep the Recent Tail After Compression?
&lt;/h2&gt;

&lt;p&gt;The most common compression failure is not total forgetting. It is losing the &lt;em&gt;feel of the live scene&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The last few turns before compression might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Just edited src/auth/session.ts
Just ran pnpm test auth
Just saw a new TypeError
The user just added: do not change the public API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These details are closest to the current action, and they are often the most important. If they all get folded into a summary, the next turn feels distant, like reading meeting minutes without having been in the room.&lt;/p&gt;

&lt;p&gt;So the better pattern is not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;old history -&amp;gt; one summary -&amp;gt; continue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;old history -&amp;gt; one summary
+ the last few raw turns
+ key recent tool outputs
-&amp;gt; continue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The underlying idea is simple: keep the recent tail and reconnect the summary to the live scene.&lt;/p&gt;

&lt;p&gt;The summary preserves the long-term storyline. The tail preserves the current feel of the work.&lt;/p&gt;

&lt;p&gt;One of the most important lessons from long-running agents is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compression is not only about remembering the past. It is also about staying grounded in the present.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zhh5fguqev8rl8eknf5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zhh5fguqev8rl8eknf5.png" alt="04.1 core mechanism - Context management sketch 4: keep the recent tail alongside the summary" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Do Not Confuse Context, Memory, and Transcript
&lt;/h2&gt;

&lt;p&gt;At this point it becomes easy to blur three different ideas together: &lt;code&gt;Context&lt;/code&gt;, &lt;code&gt;Memory&lt;/code&gt;, and &lt;code&gt;Transcript&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;They are not the same thing.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Plain-English Analogy&lt;/th&gt;
&lt;th&gt;Role in Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;Active workbench&lt;/td&gt;
&lt;td&gt;What the model can actually see on this turn; rebuilt for each request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Reusable notes&lt;/td&gt;
&lt;td&gt;Project rules, user preferences, and key session facts that are loaded before entering context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transcript&lt;/td&gt;
&lt;td&gt;Full archive&lt;/td&gt;
&lt;td&gt;The raw event log used for recovery, audit, and replay; too large to include verbatim on every turn&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The shortest way to remember them is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Context: active workbench
Memory: reusable notes
Transcript: full archive
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code's context-management logic is fundamentally about moving information between these three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Preserve the full history in the transcript
Extract key facts into memory
Pack what matters most right now into context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Treat the transcript like context, and every turn explodes in token usage.&lt;/p&gt;

&lt;p&gt;Treat context like memory, and temporary task details pollute long-term rules.&lt;/p&gt;

&lt;p&gt;Treat memory like transcript, and you lose the detailed record of what really happened.&lt;/p&gt;

&lt;p&gt;Once these boundaries are clear, a lot of agent "amnesia" becomes much easier to explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. What Claude Code Achieves Through the Seven-Dimension Lens
&lt;/h2&gt;

&lt;p&gt;One useful way to evaluate Claude Code is a seven-dimension context model: Visibility, Authority, Temperature, Shape, Retrieval, Compression, and Boundary.&lt;/p&gt;

&lt;p&gt;Seen through that lens, the system looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Claude Code mechanism&lt;/th&gt;
&lt;th&gt;Strength&lt;/th&gt;
&lt;th&gt;Current limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visibility&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;system prompt&lt;/code&gt;, user context, &lt;code&gt;toolUseContext&lt;/code&gt;, tool-result budgeting, &lt;code&gt;snip&lt;/code&gt;, and collapse jointly decide what enters the model and what stays in the runtime&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Not all information is abstracted behind one unified &lt;code&gt;ContextItem&lt;/code&gt;; visibility logic is still scattered across modules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authority&lt;/td&gt;
&lt;td&gt;system-priority rules, project rules, current user instructions, permission rules, and security policy together form a decision chain&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Conflict handling still relies on cooperation between prompts and runtime rules rather than a single explicit authority resolver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temperature&lt;/td&gt;
&lt;td&gt;the recent message tail, current tool output, session memory, and transcript / resume state behave like hot, warm, and cold layers&lt;/td&gt;
&lt;td&gt;Moderately strong&lt;/td&gt;
&lt;td&gt;The behavior exists, but the source may not always name the layers explicitly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shape&lt;/td&gt;
&lt;td&gt;raw tool results, truncation markers, summary messages, boundary messages, diffs, and structured &lt;code&gt;tool_result&lt;/code&gt; payloads coexist&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;More task state could be lifted into explicit structure instead of living mostly in natural-language history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt; loading, git status, &lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Grep&lt;/code&gt;, &lt;code&gt;Glob&lt;/code&gt;, web tools, MCP, and skills pull information in on demand&lt;/td&gt;
&lt;td&gt;Moderately strong&lt;/td&gt;
&lt;td&gt;The design leans more on tools and files than on a unified retrieval substrate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compression&lt;/td&gt;
&lt;td&gt;Tool Result Budget, &lt;code&gt;Snip&lt;/code&gt;, &lt;code&gt;MicroCompact&lt;/code&gt;, &lt;code&gt;Context Collapse&lt;/code&gt;, &lt;code&gt;AutoCompact&lt;/code&gt;, and reactive compaction form a multi-layer defense&lt;/td&gt;
&lt;td&gt;Very strong&lt;/td&gt;
&lt;td&gt;Summary drift is still a real risk, so constraints, source scope, and the recent tail must be preserved carefully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boundary&lt;/td&gt;
&lt;td&gt;permission checks, Plan Mode, tool protocols, sub-agent forks, MCP boundaries, hooks, and sandboxing isolate actions and information&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Enterprise-grade tenancy and data isolation still depend on deployment environment, not just the context layer itself&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What stands out most is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code is strongest in compression and boundaries, highly engineered in shape and retrieval, and still has room to keep abstracting visibility, authority, and temperature.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Put differently, this is no longer just "a CLI that compresses chat history." Context governance is woven through &lt;code&gt;QueryEngine&lt;/code&gt;, the prompt runtime, the tool system, the permission system, the compaction system, and the session-resume path. Together they form a full harness.&lt;/p&gt;

&lt;p&gt;But it is also not a textbook standalone &lt;code&gt;ContextManager&lt;/code&gt;. Many of these capabilities are distributed across the main loop and supporting runtime subsystems rather than centralized in one class.&lt;/p&gt;

&lt;p&gt;That is exactly what readers can miss when they first inspect the source:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not go hunting for a file literally named &lt;code&gt;ContextManager&lt;/code&gt;. Context management is a cross-cutting engineering pipeline running through the loop.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Which Objects Matter Most When You Read the Source?
&lt;/h2&gt;

&lt;p&gt;Do not start by reading files in isolation just because their filenames look relevant. A better method is to trace the context lifecycle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Where does information enter?
What state does it become?
What rules filter it?
When does compression trigger?
Where is the compressed result written back?
How does the next model turn see it again?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are good places to start:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Object to inspect&lt;/th&gt;
&lt;th&gt;Main question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Query loop / &lt;code&gt;query.ts&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Where in the main loop does context governance happen?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context builder / prompt runtime&lt;/td&gt;
&lt;td&gt;What pieces make up the model input for this turn?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;CLAUDE.md&lt;/code&gt; loader&lt;/td&gt;
&lt;td&gt;How do project rules and user memory enter the context?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Message store / messages&lt;/td&gt;
&lt;td&gt;How do user messages, model replies, and tool results accumulate?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Token budget / tracking&lt;/td&gt;
&lt;td&gt;At what point does the system consider the request unsafe in size?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool Result Budget / &lt;code&gt;Snip&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Which tool outputs get trimmed first?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MicroCompact&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which stale tool outputs can be cleaned away?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Collapse&lt;/td&gt;
&lt;td&gt;How does the system fold the view before it jumps to full summarization?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AutoCompact&lt;/code&gt; / reactive compact&lt;/td&gt;
&lt;td&gt;How is history replaced with structured summaries, and what fallback exists if compression fails?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transcript / resume&lt;/td&gt;
&lt;td&gt;How is raw history backed up and later restored?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When you read these objects, do not only ask what the function does. Ask where it sits inside the loop.&lt;/p&gt;

&lt;p&gt;Viewed alone, &lt;code&gt;TokenBudget&lt;/code&gt; can look like a simple length-calculation helper. Placed back inside &lt;code&gt;QueryEngine&lt;/code&gt;, it becomes the switch that moves the system from normal execution into compression governance.&lt;/p&gt;

&lt;p&gt;Viewed alone, &lt;code&gt;MicroCompact&lt;/code&gt; can look like message cleanup. Placed back inside a long-running task, it becomes what prevents stale tool output from continuously polluting the next round of judgment.&lt;/p&gt;

&lt;p&gt;Viewed alone, &lt;code&gt;AutoCompact&lt;/code&gt; can look like summarization. Placed back inside an agent session, it is writing the handoff note that lets the next model turn keep working.&lt;/p&gt;

&lt;p&gt;One strong habit for reading agent source code is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not just ask what a function is. Ask what kind of runaway behavior in the loop it is there to prevent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you trace one request round through &lt;code&gt;query.ts&lt;/code&gt;, the context-management path compresses into something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;getMessagesAfterCompactBoundary()
-&amp;gt; applyToolResultBudget()
-&amp;gt; snipCompactIfNeeded()
-&amp;gt; microcompact()
-&amp;gt; contextCollapse.applyCollapsesIfNeeded()
-&amp;gt; autoCompactIfNeeded()
-&amp;gt; appendSystemContext()
-&amp;gt; queryModelWithStreaming()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That line makes something very clear: context management is not a rescue step after an API error. It is proactive governance that runs before every model request.&lt;/p&gt;

&lt;p&gt;Inside that chain, &lt;code&gt;applyToolResultBudget()&lt;/code&gt; handles the noisiest source first: tool output. A shell log, a large file read, or one MCP response can fill the window faster than the message history itself. So Claude Code first applies a local budget and only then considers global compression.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;microcompact&lt;/code&gt; and &lt;code&gt;contextCollapse&lt;/code&gt; form the middle layer. They try to project, fold, and clean up local history so that the system does not degrade too quickly into "one giant summary." That matters because programming tasks need structure preserved: which tool call produced which result, which file was read, which error is still unresolved.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;autoCompactIfNeeded()&lt;/code&gt; is the heavier step. It reserves space for summary output before the context is completely exhausted. If compression fails, it also needs a circuit breaker so the system does not keep triggering the same unrecoverable compression request on every turn.&lt;/p&gt;

&lt;p&gt;Inside &lt;code&gt;compact.ts&lt;/code&gt;, pay attention to the rebuild logic after compression as well. Compression is not finished when old history becomes a paragraph. The system also has to preserve the compact boundary, summary messages, the recent tail, attachments, hook results, and sometimes even recently accessed key files. Otherwise the next turn sees only the summary and loses its current working scene.&lt;/p&gt;

&lt;p&gt;Another easy detail to miss is that some context management hides inside individual tools. For example, if the file-read tool sees that the same file and the same range were already read and the file has not changed, it can return &lt;code&gt;file_unchanged&lt;/code&gt; instead of shoving the full content into messages again. That small optimization is really a way to prevent duplicate context pollution.&lt;/p&gt;

&lt;p&gt;So when you map this chapter back to the code, do not go looking for a single class called &lt;code&gt;ContextManager&lt;/code&gt;. Follow the lifecycle instead: how information enters, gets budgeted, gets collapsed, gets compressed, and then gets restored.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. If You Were Building a Minimal Context Manager Yourself, Where Would You Start?
&lt;/h2&gt;

&lt;p&gt;If you want to build a "mini Claude Code," you do not need to reproduce the full six-layer compaction pipeline on day one.&lt;/p&gt;

&lt;p&gt;A minimal version can start here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Append user messages, assistant replies, and tool results into one message store
2. Estimate token usage before each model request
3. Trim oversized tool results first when they cross a threshold
4. Keep the last N turns in raw form
5. Compress older history into a structured summary
6. Persist the raw transcript to disk for recovery
7. Force the summary to retain: user goal, constraints, files changed, failed attempts, and next step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That already solves the problem that causes many demo agents to lose coherence after just a few turns.&lt;/p&gt;

&lt;p&gt;Once that works, you can gradually add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;loading project rules and user memory&lt;/li&gt;
&lt;li&gt;different budgets for different tool types&lt;/li&gt;
&lt;li&gt;reference-based file handling plus re-read on demand&lt;/li&gt;
&lt;li&gt;collapsed views instead of immediate full summarization&lt;/li&gt;
&lt;li&gt;sub-agent or fork isolation for long search tasks&lt;/li&gt;
&lt;li&gt;permission-aware context injection&lt;/li&gt;
&lt;li&gt;a context-plan log explaining why this turn included these specific pieces of information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a more stable evolution path than simply starting with a model that has an enormous context window.&lt;/p&gt;

&lt;p&gt;Larger windows solve a capacity problem. They do not automatically solve an information-discipline problem. The real challenge for an agent is this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;As the information keeps growing,
can the system keep showing the model the small subset that matters most right now?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  11. One-Sentence Summary
&lt;/h2&gt;

&lt;p&gt;If you compress this whole chapter into one sentence, it becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code's context management is not about feeding the model more and more history. It is about continuously assembling, budgeting, pruning, folding, summarizing, and reconnecting information inside a finite token budget.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you compress it even further, it turns into six verbs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Assemble: decide what the model should see this turn
Budget: detect when tokens are entering the danger zone
Prune: trim oversized tool outputs first
Fold: preserve fine-grained structure where possible
Summarize: produce a handoff note the next turn can continue from
Reconnect: keep the recent tail attached to the live working state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the end, context management determines something very practical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After turn 20,
does the agent still behave like someone who has been working continuously,
or like someone who just woke up and only read the meeting notes?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is one of the key mechanisms that turns Claude Code from "a chat box with tools" into "an engineering agent that can sustain long-running work."&lt;/p&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>claude</category>
      <category>llm</category>
    </item>
    <item>
      <title>Claude Code Source Analysis Series, Chapter 3: Prompt Construction</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 07:59:44 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-3-prompt-construction-bck</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-3-prompt-construction-bck</guid>
      <description>&lt;h1&gt;
  
  
  Chapter 3 of the &lt;em&gt;Claude Code Source Analysis Series&lt;/em&gt; — Prompt Construction
&lt;/h1&gt;

&lt;p&gt;Claude Code does not run on a single static prompt. Before every model call, it rebuilds a working context from system rules, project memory, runtime state, tool descriptions, message history, and the user's latest input.&lt;/p&gt;

&lt;p&gt;That leads straight to a new question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before each round of model invocation, what exactly does Claude Code show the model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When people first build an Agent, they often start from a very natural assumption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;All I need is a sufficiently strong system prompt.
I'll tell the model it's a programming assistant,
that it should follow the rules and can call tools.
Wouldn't that basically give me Claude Code?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That assumption isn't wrong, but it only scratches the surface.&lt;/p&gt;

&lt;p&gt;The real Claude Code does not rely on a single fixed prompt at all. Before each round of calling the model, it assembles a fresh batch of information from scratch: base identity, system rules, the current mode, project memory, user preferences, Git status, tool descriptions, skill descriptions, MCP capabilities, message history, tool results, compressed summaries — plus the user's latest input.&lt;/p&gt;

&lt;p&gt;So this chapter is not answering:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does Claude Code's prompt say?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is answering:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Code stitch together context from multiple sources at runtime into an input that the model can understand, act on, and stay within bounds?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In one sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code's prompt is not a static template. It is a runtime assembly process: it selects system prompts by priority, loads memory by layer, injects dynamic context each turn, and feeds tool results back into the next round's messages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Take a look at this diagram first to build a mental model:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa40qq5x3c6z94d5vuho.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa40qq5x3c6z94d5vuho.png" alt="Prompt Construction Figure 1" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What this diagram shows is that what looks like a single prompt is actually split into a stable segment, a dynamic segment, a memory segment, the current user message, and message history. What this runtime manages is not just wording. It manages &lt;strong&gt;which source overrides which, what enters context first, what can be cached, and what must be refreshed every turn&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why can't you just write one big prompt?
&lt;/h2&gt;

&lt;p&gt;Think about a real scenario.&lt;/p&gt;

&lt;p&gt;A user types this from the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Help me figure out why this project's tests are failing and fix them.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a single generic prompt, the model knows at most:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a programming assistant.
Help the user fix code.
Keep answers concise.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not even close.&lt;/p&gt;

&lt;p&gt;What the model actually needs to know:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which project is this?
What's the current directory?
Does the project have its own development conventions?
Does the user have personal preferences?
Are there uncommitted changes in the working tree right now?
Which tools are available?
Which commands require confirmation?
Which files have already been read and which tests run in previous turns?
If the context gets too long, which history has been compressed into summaries?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This information isn't hardcoded into a template — it's only available at runtime.&lt;/p&gt;

&lt;p&gt;Git status changes. Today's date changes. Tool call results change. User input changes. The &lt;code&gt;CLAUDE.md&lt;/code&gt; in the current directory might be completely different from another project's.&lt;/p&gt;

&lt;p&gt;So the challenge Claude Code faces isn't "how to write a universal prompt" — it's a more engineering-driven question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before each model call, exactly which information should go in? In what order? If information conflicts, which source wins? If it's too long, what gets dropped first? If something can be cached, where do you draw the boundary?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's why Prompt Runtime exists.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20fc42gnzjp7smr5cohr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F20fc42gnzjp7smr5cohr.png" alt="Prompt Construction Sketch 1: The Prompt Runtime Workbench" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. A Single Model Turn Contains More Than Just the System Prompt
&lt;/h2&gt;

&lt;p&gt;Let's disentangle a few concepts first, so they don't blur together later.&lt;/p&gt;

&lt;p&gt;We habitually call everything we feed the model a "prompt." But in an agent system like Claude Code, a single model turn breaks down into at least these categories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;system prompt       System-level behavioral rules
system context      System environment context, e.g. Git status
user context        User and project context, e.g. date, CLAUDE.md
messages            User messages, model responses, tool calls, tool results
toolUseContext      Currently available tools, their schemas, permissions, and execution context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each of these has a distinct job.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;system prompt&lt;/code&gt; is the model's operating manual — it says "who you are, how you work, what your boundaries are."&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;user context&lt;/code&gt; provides long-term constraints from the user and the project: "this project uses pnpm," "write commit messages in Chinese," "don't directly edit generated files."&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;system context&lt;/code&gt; is runtime intelligence: the current Git branch, working-tree changes, latest commit, current username.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;messages&lt;/code&gt; are the live ledger — a record of everything that has happened so far in this task: what the user said, what tools the model invoked, what results came back.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;toolUseContext&lt;/code&gt; tells the model which "hands and feet" it has available this turn: &lt;code&gt;Read&lt;/code&gt;, &lt;code&gt;Edit&lt;/code&gt;, &lt;code&gt;Bash&lt;/code&gt;, &lt;code&gt;Grep&lt;/code&gt;, &lt;code&gt;Task&lt;/code&gt;, MCP tools, Skill tools, and so on.&lt;/p&gt;

&lt;p&gt;So Claude Code's assembly logic can be simplified to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Stable system rules
+ current runtime environment
+ user / project memory
+ tool capability descriptions
+ historical messages and tool results
+ current user input
=&amp;gt; this turn's model request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fixating solely on "how well-written the prompt text is" misses the point. What really determines whether an agent is reliable is how this information gets organized, overlaid, cached, and compressed. You can craft a gorgeous system prompt, but if the tool results aren't correctly backfilled, the model will still lose its thread.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. First Layer: The System Prompt Is a Priority Decision, Not Simple Concatenation
&lt;/h2&gt;

&lt;p&gt;First, let's look at the system prompt priority. Abstracted, it forms this selection chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0. overrideSystemPrompt   Completely replaces all prompts, e.g. loop mode
1. Coordinator prompt     Used when coordinator mode is active
2. Agent prompt           Prompt defined by a custom Agent
3. customSystemPrompt     Specified via --system-prompt
4. defaultSystemPrompt    The standard default Claude Code prompt
5. appendSystemPrompt     Always appended at the end
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's unpack these names individually. They are not configuration items of the same provenance — they are the "entry points" Claude Code reserves when assembling the effective system prompt:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;Where it comes from&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;overrideSystemPrompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The highest-priority internal override. When present, it bypasses the default prompt, custom prompt, and Agent prompt selection logic entirely, directly replacing the main system prompt with this content.&lt;/td&gt;
&lt;td&gt;Not a regular user-facing CLI argument. It typically comes from higher-level internal callers — for example, certain loop / fork / background task modes pass a pre-rendered system prompt when invoking an Agent or QueryEngine. It solves the problem of "the current task must run under a different set of rules."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Coordinator prompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The coordinator's own system prompt. It defines the model as a multi-Agent orchestrator whose core responsibilities are decomposing tasks, assigning workers, collecting results, and synthesizing judgments — rather than editing files or executing all tools directly.&lt;/td&gt;
&lt;td&gt;Originates from Coordinator-mode modules. It is only used when coordinator mode is activated by feature flag and runtime configuration. This mode also changes available tools and worker descriptions accordingly.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Agent prompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The exclusive system prompt for a sub-Agent or custom Agent. It defines what this Agent is suited for, which tools it can use, and what form its output should take.&lt;/td&gt;
&lt;td&gt;Comes from the Agent definition. Built-in Agents generate it dynamically via &lt;code&gt;getSystemPrompt()&lt;/code&gt;; custom Agents typically come from user-authored Agent Markdown / JSON definitions, where the body becomes that Agent's system prompt, optionally augmented with Agent memory-related prompts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;customSystemPrompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The user-explicitly-specified "replacement for the default system prompt." It is not supplementary notes — it replaces the default Claude Code prompt with user-provided content.&lt;/td&gt;
&lt;td&gt;Comes from the CLI / SDK entry point, e.g. &lt;code&gt;--system-prompt&lt;/code&gt;, represented in &lt;code&gt;QueryEngineConfig&lt;/code&gt; as &lt;code&gt;customSystemPrompt?: string&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;defaultSystemPrompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The standard system prompt for a normal Claude Code session. It defines the default persona, collaboration style, tool-use principles, safety boundaries, code-task handling approach, and so on.&lt;/td&gt;
&lt;td&gt;Comes from Claude Code's built-in prompt construction functions. The main flow calls logic like &lt;code&gt;fetchSystemPromptParts()&lt;/code&gt; / &lt;code&gt;getSystemPrompt()&lt;/code&gt; to obtain the default system prompt fragments.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;appendSystemPrompt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Supplementary constraints that are always appended at the end of the main prompt. It does not change the model's persona, only adds an extra block of rules after the already-selected main system prompt.&lt;/td&gt;
&lt;td&gt;Comes from the CLI / SDK entry point, e.g. &lt;code&gt;--append-system-prompt&lt;/code&gt;, and may also be auto-appended by certain internal modes with additional notes. Placed at the end of the system prompt array during assembly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So these "sources" fall roughly into three categories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Built-in:       defaultSystemPrompt / Coordinator prompt / built-in Agent prompts
User-configured: customSystemPrompt / appendSystemPrompt
Internal runtime: overrideSystemPrompt / certain auto-appended appendSystemPrompt values
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looking at the source-level flow makes this even clearer. In the normal main flow, &lt;code&gt;QueryEngine.submitMessage()&lt;/code&gt; first obtains the default prompt, user context, and system context, then assembles them roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;asSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;customPrompt&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;customPrompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;defaultSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;memoryMechanicsPrompt&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;memoryMechanicsPrompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
  &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="nx"&gt;appendSystemPrompt&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;appendSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This logic expresses the most basic substitution relationship: if &lt;code&gt;customSystemPrompt&lt;/code&gt; is present, use it to replace &lt;code&gt;defaultSystemPrompt&lt;/code&gt;; if not, use the default prompt; finally, append &lt;code&gt;appendSystemPrompt&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Second Layer: &lt;code&gt;CLAUDE.md&lt;/code&gt; Is Project Memory, Not an Ordinary README
&lt;/h2&gt;

&lt;p&gt;The system prompt answers "how should the model behave by default," but it doesn't yet know the rules of the current project.&lt;/p&gt;

&lt;p&gt;That's where &lt;code&gt;CLAUDE.md&lt;/code&gt; comes in.&lt;/p&gt;

&lt;p&gt;Think of &lt;code&gt;CLAUDE.md&lt;/code&gt; as Claude Code's &lt;strong&gt;project work specification&lt;/strong&gt;. It's not a README written for humans — it's an operating manual written for an Agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How do you start this project?
What's the test command?
What code style does it follow?
Which directories should never be touched?
How should PR descriptions be written?
What should you watch out for when touching database migrations?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;CLAUDE.md&lt;/code&gt; loading order can be drawn as a memory hierarchy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Managed memory  /etc/claude-code/CLAUDE.md
2. User memory     ~/.claude/CLAUDE.md
3. Project memory  CLAUDE.md, .claude/CLAUDE.md, .claude/rules/*.md
4. Local memory    CLAUDE.local.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatfz1xyskmc5mgrj8d68.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatfz1xyskmc5mgrj8d68.png" alt="Prompt Construction Figure 3" width="800" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each tier has a distinct purpose.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Managed memory&lt;/code&gt; holds organization- or admin-level rules. This is where company-wide constraints go — security policies, code review requirements, production environment prohibitions.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;User memory&lt;/code&gt; stores the user's own long-term preferences. Maybe you prefer explanations in Chinese, have a preferred testing style, or want commit messages to follow a particular format.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Project memory&lt;/code&gt; contains project-level rules, usually maintained alongside the repository. It tells the Agent how this specific project is built, its directory conventions, and the boundaries of its tech stack.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Local memory&lt;/code&gt; holds private, local rules that typically aren't checked into version control. This is for information that only applies to your machine — local service ports, private paths, temporary debugging habits.&lt;/p&gt;

&lt;p&gt;This hierarchy isn't complexity for complexity's sake. It solves a very practical problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Agent must obey organization rules, respect user preferences, adapt to the current project, and never leak local private configuration to the team.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's also why the memory layers need to be separate: if organization rules, user preferences, project norms, and local private config are all jumbled together, it becomes impossible to decide which one wins when they conflict.&lt;/p&gt;

&lt;p&gt;With a single flat &lt;code&gt;CLAUDE.md&lt;/code&gt;, all this information would bleed together. Organization rules, personal habits, project conventions, and local ad-hoc settings get dumped into one file, and when conflicts arise, there's no clear priority.&lt;/p&gt;

&lt;p&gt;By layering the memory system, Claude Code turns it into a governable stack of rules.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feke906spkvwum78ilhor.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feke906spkvwum78ilhor.png" alt="03-core-mechanism-prompt-engineering-sketch-3-claude-md-memory-layers" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does &lt;code&gt;CLAUDE.md&lt;/code&gt; end up in the prompt?
&lt;/h3&gt;

&lt;p&gt;Because the model simply doesn't know your project's conventions.&lt;/p&gt;

&lt;p&gt;Say the project contains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; Use pnpm, never npm.
&lt;span class="p"&gt;-&lt;/span&gt; After editing TypeScript files, always run pnpm typecheck.
&lt;span class="p"&gt;-&lt;/span&gt; Never manually edit the generated/ directory.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If these rules don't make it into context, the model will very naturally run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or directly modify generated files.&lt;/p&gt;

&lt;p&gt;This isn't the model being "stupid" — it just hasn't seen the project rules.&lt;/p&gt;

&lt;p&gt;The value of &lt;code&gt;CLAUDE.md&lt;/code&gt; is that it loads project knowledge into the model's visible working memory, so it operates the way the current repo expects from the very start.&lt;/p&gt;

&lt;p&gt;There's a boundary to keep in mind, though: more &lt;code&gt;CLAUDE.md&lt;/code&gt; is not always better.&lt;/p&gt;

&lt;p&gt;If you stuff tens of thousands of words of historical explanations into it, the model gets drowned in noise and costs go up. That's why Claude Code applies size limits, caching, and selective loading to memory content. More advanced sub-agents may also choose &lt;code&gt;omitClaudeMd&lt;/code&gt;, skipping project memory in certain read-only search or planning scenarios to reduce token cost and attention noise.&lt;/p&gt;

&lt;p&gt;This reveals the core idea — &lt;code&gt;CLAUDE.md&lt;/code&gt; isn't about "always shove the whole thing into the prompt no matter what." It's about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Injecting the right tier of project memory into the model for the right task.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Third Layer: Dynamic Context Is Re-evaluated Every Turn
&lt;/h2&gt;

&lt;p&gt;By this point, we have the system prompt and project memory. But Claude Code still needs runtime intelligence.&lt;/p&gt;

&lt;p&gt;Typical dynamic context includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current date
Current working directory
Current Git branch
git status
Most recent commit
Current username
Tools available this turn
Tools exposed by MCP servers
Discovered Skills
Permission mode
Compression summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of this belongs in &lt;code&gt;CLAUDE.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Git status changes constantly, the date advances daily, the tool list shifts with MCP connections, and Skills can be hot-reloaded at runtime.&lt;/p&gt;

&lt;p&gt;So Claude Code generates context at runtime through paths like &lt;code&gt;getSystemContext()&lt;/code&gt; and &lt;code&gt;getUserContext()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A rough sketch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;getSystemContext()
-&amp;gt; reads Git status, branch, recent commits, environment info
-&amp;gt; produces system-level context

getUserContext()
-&amp;gt; reads date, CLAUDE.md, user / project memory
-&amp;gt; produces user-level context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has two benefits.&lt;/p&gt;

&lt;p&gt;First, dynamic information does not pollute the static prompt.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;defaultSystemPrompt&lt;/code&gt; stays stable; high-churn information like Git status is injected separately. That makes caching easier and makes it simpler to detect what actually changed.&lt;/p&gt;

&lt;p&gt;Second, the model sees the most relevant current information on every turn.&lt;/p&gt;

&lt;p&gt;On the first turn, the model hasn't read any files yet and needs more global rules and tool descriptions. By the fifth turn, it already has test errors and the relevant source code — at that point the message history and tool results are what matter. If context compression occurs, old history gets replaced by a summary, and the model sees the compressed state on the next turn.&lt;/p&gt;

&lt;p&gt;So Claude Code's context assembly is not a one-time event — it's a continuous action inside the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input
-&amp;gt; build context for this turn
-&amp;gt; call the model
-&amp;gt; model requests tools
-&amp;gt; tool results written back into messages
-&amp;gt; check whether compression is needed
-&amp;gt; rebuild context for the next turn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This connects directly to the ReAct loop: prompt assembly does not sit outside the agent runtime. It sits at the entrance to every turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Caching: Why Separate Stable and Dynamic Segments?
&lt;/h2&gt;

&lt;p&gt;This comes down to a very practical concern: an agent calls the model frequently, and if every turn has to recompute, re-bill, and re-process the same massive block of system prompts, the cost and latency both spike.&lt;/p&gt;

&lt;p&gt;So Claude Code tries to keep stable content at the front, making it easier to hit the prompt cache.&lt;/p&gt;

&lt;p&gt;Stable segments typically include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Identity introduction
System rules
Task execution guidelines
Operational safety guidelines
Tool usage guidelines
Tone and style
Output efficiency requirements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These generally stay the same across a session and are well-suited for caching.&lt;/p&gt;

&lt;p&gt;Dynamic segments include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent tool context
Skills context
CLAUDE.md loading results
MCP server directives
Git status
Current date
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These change more frequently and need to be separated from the stable segments.&lt;/p&gt;

&lt;p&gt;The cache boundary can be drawn like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhptzauhevqm7ot027q7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkhptzauhevqm7ot027q7.png" alt="03. Core Mechanism - Prompt Authoring, Figure 4" width="800" height="48"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The source mentions the &lt;code&gt;SYSTEM_PROMPT_DYNAMIC_BOUNDARY&lt;/code&gt; design. In plain terms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keep everything above this line as stable as possible for caching;
everything below this line may change and is refreshed per turn.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This boundary is critical.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlcpb4htj7ijz6gjt3rs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlcpb4htj7ijz6gjt3rs.png" alt="03. Core Mechanism - Prompt Authoring, Sketch 4: Cache boundary between stable and dynamic segments" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you mix highly volatile information into the stable segment — say, stuffing a dynamic skill list directly into tool descriptions — every change to that skill list invalidates the entire system prompt cache. What looks like a small list update can actually force every turn's request to re-process thousands or even tens of thousands of tokens.&lt;/p&gt;

&lt;p&gt;That's why the prompt runtime isn't just about "giving the model more to know" — it also has to control costs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Keep stable content as stable as possible
Isolate dynamic content separately
Have a clear boundary for cache invalidation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is also what separates Claude Code from a toy agent. A toy agent only cares about "does it run?" A mature agent also cares about "after 20 turns, 50 turns, 100 turns — are the cost and latency still acceptable?"&lt;/p&gt;

&lt;p&gt;In complex tasks, the gap between cache hits and cache misses gets amplified across multiple turns. What the user perceives isn't a minor optimization — it's whether the entire task feels smooth or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. User Prompt: The Current Question Is Just the Final Puzzle Piece
&lt;/h2&gt;

&lt;p&gt;We've covered the system prompt, CLAUDE.md, and dynamic context. Now let's look at user input.&lt;/p&gt;

&lt;p&gt;User input certainly matters, but it doesn't reach the model in isolation.&lt;/p&gt;

&lt;p&gt;Say the user types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix the tests.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sentence is very short on its own. Sent to the model in isolation, it carries almost no actionable information.&lt;/p&gt;

&lt;p&gt;But inside Claude Code, it arrives alongside all the context that came before it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System rules: You are Claude Code. Modify files with care.
Project memory: This project uses pnpm; the test command is pnpm test.
Git status: The current branch has 3 uncommitted files.
Tool descriptions: Read, Grep, Bash, Edit are available.
Message history: The user just mentioned a failing login test.
Current input: Fix the tests.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What the model understands is no longer the bare phrase "Fix the tests." It becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;On top of the current repo, the current rules, the current tool boundaries,
and the current task history, carry forward the engineering action
of "fixing the tests."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user prompt is more like the final trigger. It tells the Agent &lt;em&gt;what to do now&lt;/em&gt;, but whether the Agent can do it correctly depends on whether the context assembly that came before it is complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Why do tool results count as part of prompt assembly?
&lt;/h2&gt;

&lt;p&gt;This is easy to overlook.&lt;/p&gt;

&lt;p&gt;We usually think of a prompt as "the content written before sending it to the model." But in an agent loop, tool results become part of the next round's model input.&lt;/p&gt;

&lt;p&gt;For example, in the first round the model decides:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I want to read package.json.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code executes the &lt;code&gt;Read&lt;/code&gt; tool and gets back the file contents.&lt;/p&gt;

&lt;p&gt;If the tool result isn't written back into &lt;code&gt;messages&lt;/code&gt;, the model in the next round still has no idea what's inside &lt;code&gt;package.json&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So tool-result write-back is a critical step in prompt construction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model expresses an intention to act
-&amp;gt; The tool system executes it
-&amp;gt; The tool result becomes a message
-&amp;gt; Carried into the model during the next round of context assembly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step translates facts from the external world back into context the model can read.&lt;/p&gt;

&lt;p&gt;In other words, Claude Code does not assemble prompt context once at input time. It keeps growing, trimming, and rebuilding it after every new observation.&lt;/p&gt;

&lt;p&gt;That's why the Agent can work across multiple rounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Connecting the Entire Chain
&lt;/h2&gt;

&lt;p&gt;Now we can distill Claude Code's prompt assembly into a single chain:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwe9f550pux4z5yrnmdu4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwe9f550pux4z5yrnmdu4.png" alt="Prompt Construction Figure 5" width="800" height="2064"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are three key takeaways from this diagram.&lt;/p&gt;

&lt;p&gt;First, system prompts have priority — not all sources are simply concatenated flat.&lt;/p&gt;

&lt;p&gt;Second, information like &lt;code&gt;CLAUDE.md&lt;/code&gt;, Git status, the current date, and tool context is not static templating; it's runtime context.&lt;/p&gt;

&lt;p&gt;Third, tool results and compressed summaries feed back into the next turn's input — prompt construction spans the entire agent loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Tying It All Together with a Complete Example
&lt;/h2&gt;

&lt;p&gt;Suppose the user types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix the login test for me.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What Claude Code actually feeds the model is not that single line — it's an entire operating snapshot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Select the system prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A normal conversation uses the default Claude Code behavior rules; a sub-agent gets an agent-specific prompt; coordinator mode uses the Coordinator prompt; if an override is present, it replaces everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Load memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system might read:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User-level: Reply in Chinese.
Project-level: This project uses pnpm.
Project-level: The test command is pnpm test -- --runInBand.
Project-level: Do not modify generated/ directly.
Local-level: The local backend service port is 4000.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Inject dynamic context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system might supplement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current branch: feature/login-test
Git status: src/auth/login.ts has unstaged changes
Recent commit: fix auth redirect
Current date: 2026-05-02
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Prepare tool context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model sees what it can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read / Grep / Bash / Edit / Task ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It also learns which tools require permission and which operations it cannot perform directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Merge message history with the current user input.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the user already pasted an error log earlier, or if the model just ran a test in the previous turn, those all enter the current round of context as part of &lt;code&gt;messages&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: The model starts acting.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model might first call &lt;code&gt;Grep&lt;/code&gt; to search for the login test, then &lt;code&gt;Read&lt;/code&gt; to inspect the test file, then &lt;code&gt;Bash&lt;/code&gt; to run the specific test, and after getting the error back, use &lt;code&gt;Edit&lt;/code&gt; to modify the source.&lt;/p&gt;

&lt;p&gt;Every tool result is fed back into &lt;code&gt;messages&lt;/code&gt;, and Claude Code reassembles the context for the next round.&lt;/p&gt;

&lt;p&gt;This is why Claude Code appears to "understand your project." It does not understand it out of thin air. Every round, it puts project rules, runtime state, and tool observations back in front of the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  11. What Problem Does This Mechanism Actually Solve?
&lt;/h2&gt;

&lt;p&gt;The prompt assembly mechanism isn't about "writing longer prompts" — it addresses four engineering problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, behavioral consistency.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;System prompt tiering ensures the model maintains a clear identity across different modes. &lt;code&gt;CLAUDE.md&lt;/code&gt; layering ensures that organizational rules, user preferences, and project conventions reliably enter the context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, task relevance.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dynamic context gives the model awareness of the current directory, Git status, date, tool set, and recent execution results — rather than rigidly applying generic rules to every project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, continuity across long tasks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Tool results are fed back into &lt;code&gt;messages&lt;/code&gt;, and compressed summaries carry forward into subsequent rounds. The Agent doesn't suffer amnesia after each action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fourth, cost and performance.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stable-segment caching, dynamic-segment isolation, and &lt;code&gt;CacheSafeParams&lt;/code&gt; reuse prevent multi-turn calls and child Agent forks from re-processing the entire prefix every time.&lt;/p&gt;

&lt;p&gt;These four together — that's the real Prompt Engineering behind Claude Code.&lt;/p&gt;

&lt;p&gt;More precisely, this is no longer prompt engineering in the narrow, traditional sense. It is &lt;strong&gt;context engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The distinction is worth remembering: Prompt Engineering asks "how do I phrase this for better results?" Context Engineering asks "what information does the model see, in what order, and how is it updated?" The latter is the core competency of production-grade Agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  12. The Final Word in One Sentence
&lt;/h2&gt;

&lt;p&gt;Claude Code's prompt is not some "magic incantation."&lt;/p&gt;

&lt;p&gt;It's more like a workbench that gets reorganized every single turn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System rules on the left,
project memory on the right,
the current task in the center,
tools laid out alongside,
compacted and tidied when the desk gets too cluttered,
then laid out fresh again when it's time to get back to work.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dynamic assembly mechanism is the key piece that transforms Claude Code from an ordinary chat window into an engineering agent.&lt;/p&gt;

&lt;p&gt;If the ReAct chapter was about how an agent acts turn by turn, this chapter is about:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before each turn of action, how Claude Code rearranges the world the model needs to see.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
    </item>
    <item>
      <title>Claude Code Source Analysis Series, Chapter 2: The ReAct Main Loop</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 07:59:43 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-2-the-react-main-loop-ad5</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-2-the-react-main-loop-ad5</guid>
      <description>&lt;h1&gt;
  
  
  Chapter 2 of the &lt;em&gt;Claude Code Source Analysis Series&lt;/em&gt; — The ReAct Main Loop
&lt;/h1&gt;

&lt;p&gt;Claude Code is not just a model wrapper. It is a runtime where &lt;code&gt;Model API&lt;/code&gt; handles reasoning, &lt;code&gt;QueryEngine&lt;/code&gt; carries the session forward, &lt;code&gt;Tools&lt;/code&gt; interface with the real engineering environment, and &lt;code&gt;Context / State&lt;/code&gt; keeps multi-step work coherent across turns.&lt;/p&gt;

&lt;p&gt;This chapter drills into the innermost control loop: how &lt;code&gt;query.ts&lt;/code&gt; turns a single model call into an agent run that can keep gathering evidence, invoking tools, and advancing the task.&lt;/p&gt;

&lt;p&gt;We'll use a simple debugging scenario throughout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Take a look at why the tests are failing in this project and fix them.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A model on its own cannot natively read files, run commands, or maintain task state. So the question becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Claude Code get the model to operate inside a controlled loop — reasoning, acting, absorbing results — until the task genuinely moves forward?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is exactly what &lt;code&gt;ReAct&lt;/code&gt; solves.&lt;/p&gt;

&lt;p&gt;You do not need to memorize the acronym. Just keep this minimal feedback loop in mind:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Assess the current situation
Decide what to do next
Actually carry it out
Get the result
Reassess based on the new result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a flowchart, it looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhfvnkore6ggm91nk9at.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhfvnkore6ggm91nk9at.png" alt="ReAct Figure 1" width="800" height="1874"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What &lt;code&gt;query.ts&lt;/code&gt; does in Claude Code is engineer this feedback loop into a working system. The model reasons over the current context, then decides whether to act; the results of that action are written back into context, and the model proceeds to the next round of reasoning.&lt;/p&gt;

&lt;p&gt;The part of the diagram that really matters isn't the fancy terminology — it's the straightforward state machine on the right side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build Query
-&amp;gt; Request Model API
-&amp;gt; Parse the response
-&amp;gt; Check if there are tool calls
-&amp;gt; If none, return the result
-&amp;gt; If yes, invoke the tools
-&amp;gt; Append tool results to messages
-&amp;gt; Check if compression is needed
-&amp;gt; Loop back to the next Query
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This loop is what turns the architecture into a working runtime instead of a static component list.&lt;/p&gt;

&lt;p&gt;But if you think of it as nothing more than a &lt;code&gt;while&lt;/code&gt; loop, you're still missing a layer. A better framing is to see &lt;code&gt;QueryEngine&lt;/code&gt; not as a "single-request handler" but as a "session-level task orchestrator." It doesn't just spin up for one incoming message and then disappear — it holds long-lived state across an entire conversation, threading together the model, tools, permissions, context, and compression.&lt;/p&gt;

&lt;p&gt;So in this chapter we need to grasp both layers at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The query.ts layer: how each round of ReAct state transitions happens.
The QueryEngine layer: how state, tools, permissions, and resources are continuously orchestrated across an entire session.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The former explains "how the loop runs"; the latter explains "why this loop can persist stably across many rounds of a task."&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why the Main Loop Can't Just Call the Model API Once
&lt;/h2&gt;

&lt;p&gt;Let's start with the simplest case.&lt;/p&gt;

&lt;p&gt;A user asks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Explain what useEffect does.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The program sends the question to the model, the model produces an answer, done.&lt;/p&gt;

&lt;p&gt;But what if the user asks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;This React project won't start. Help me fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model almost certainly doesn't know the answer on the first round. It needs more facts at minimum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What's the project structure?
What scripts are in package.json?
What error does the start command throw?
Where's the relevant source code?
After I make changes, do the tests pass?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These facts don't live in the model's parameters, and they're not in the user's one-liner. They exist in the real engineering environment: the file system, the shell, Git, test frameworks, logs, dependency configs.&lt;/p&gt;

&lt;p&gt;So the agent needs an extra layer of mechanism:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model assesses what it's missing
→ initiates a tool call
→ the program executes the tool
→ feeds the result back to the model
→ the model reassesses based on new facts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's why the ReAct loop exists.&lt;/p&gt;

&lt;p&gt;It's not about making the pipeline complex. Real tasks simply can't be resolved in a single response. It's more like a continuous correction process: form a hypothesis, gather evidence from the scene, then adjust the next move based on what you find.&lt;/p&gt;

&lt;p&gt;Anyone who's debugged a production outage will recognize this pattern: start with a hypothesis, go gather evidence on the ground, then revise the next step based on what the evidence tells you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ix0b1pluqfbrdz5hr3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ix0b1pluqfbrdz5hr3o.png" alt="ReAct Sketch 1: The Minimal ReAct Loop" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. ReAct Is Not the Model Acting on Its Own — It Is the Model Expressing Intent
&lt;/h2&gt;

&lt;p&gt;This is an easy point to get wrong:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model is not literally reading files, running commands, or editing code by itself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What the model can produce is an &lt;em&gt;intent to act&lt;/em&gt;. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need to read package.json.
I need to search for handleEnter.
I need to run npm test.
I need to edit a file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The part that actually acts is Claude Code's host runtime: the outer layer made up of &lt;code&gt;QueryEngine&lt;/code&gt;, the tools system, and the permissions system.&lt;/p&gt;

&lt;p&gt;So the more accurate division of labor is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model decides what should happen next.
Claude Code decides whether it is allowed, how it is executed, and how the result is recorded.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is exactly why Claude Code is much more than simply "connecting the model to a shell."&lt;/p&gt;

&lt;p&gt;If you let the model emit raw shell commands and execute them directly, the system has no structured understanding of what the action means. Permissions, auditing, error recovery, and context write-back all become difficult to control.&lt;/p&gt;

&lt;p&gt;The tools system turns an action into a structured event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool: Read
Arguments: a file path
Permission mode: read-only
Result: file contents or an error
Write-back: appended to messages as a tool result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That way, the model still does the reasoning, but the action itself is placed inside a controlled engineering framework.&lt;/p&gt;

&lt;p&gt;In one sentence: the model decides, the tools touch the real world, and &lt;code&gt;QueryEngine&lt;/code&gt; organizes judgment and action into a sustainable loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The &lt;code&gt;query.ts&lt;/code&gt; State Machine: The Core Is Not a Function — It's the &lt;code&gt;State&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85lpicvpq4oswglck657.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F85lpicvpq4oswglck657.png" alt="ReAct Figure 2" width="800" height="1430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The diagram on the left lists the &lt;code&gt;State&lt;/code&gt; structure in &lt;code&gt;query.ts&lt;/code&gt;. It highlights something important:&lt;/p&gt;

&lt;p&gt;The Claude Code main loop doesn't chug along on scattered global variables — it revolves around a unified state object.&lt;/p&gt;

&lt;p&gt;In simplified form, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;State&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;MessageParam&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;toolUseContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToolUseContext&lt;/span&gt;
  &lt;span class="nx"&gt;turnCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;shouldAutoCompact&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;
  &lt;span class="nx"&gt;autoCompactTracking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;consecutiveFailures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
    &lt;span class="nx"&gt;totalMessages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nl"&gt;aborted&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywezbtlbz9ug1vmlwzao.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywezbtlbz9ug1vmlwzao.png" alt="ReAct Sketch 2: State Drives the Main Loop" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These few fields are the keys to understanding the ReAct loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;messages&lt;/code&gt;: The Agent's Short-Term Working Memory
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;messages&lt;/code&gt; is not an ordinary chat log.&lt;/p&gt;

&lt;p&gt;Inside the agent loop, it acts more like a running ledger:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What the user just said
What the model decided in the last turn
What tool calls the model initiated
What results the tools returned
What summary the system retained after compaction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model does not automatically remember everything that has happened before. Every time Claude Code calls the Model API, it repackages the relevant history and includes it in the next model request.&lt;/p&gt;

&lt;p&gt;So the point of &lt;code&gt;messages&lt;/code&gt; is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Turning multi-turn actions into context the model can see in the next turn.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without &lt;code&gt;messages&lt;/code&gt;, every model invocation would start from scratch, as if suffering from amnesia.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;toolUseContext&lt;/code&gt;: What Tools Are Available This Turn
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;toolUseContext&lt;/code&gt; is the tool environment.&lt;/p&gt;

&lt;p&gt;It's not just a list of tools — it tells the main loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which tools are available right now?
What is the input schema for each tool?
What context does tool execution need?
How should results be converted into messages?
Which operations require permission checks?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Act&lt;/code&gt; in ReAct is not an abstract action — it's a concrete action constrained by the tool system.&lt;/p&gt;

&lt;p&gt;"Read a file" via the &lt;code&gt;Read&lt;/code&gt; tool and "read a file" by running &lt;code&gt;cat&lt;/code&gt; directly are two entirely different things in engineering terms. The former is traceable, constrainable, and can be written back into context as a structured tool result; the latter is just a string — and you may never know what it actually did when something goes wrong.&lt;/p&gt;

&lt;p&gt;In other words, tools don't just need to run — they need to be traceable, constrainable, and structured so their results can flow cleanly back into the loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;turnCount&lt;/code&gt;: This Is a Multi-Turn System, Not a Single Request
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;turnCount&lt;/code&gt; tracks how many iterations the loop has already completed.&lt;/p&gt;

&lt;p&gt;The field itself looks mundane, but it exposes a fundamental design truth:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code was designed from the start with the assumption that tasks will span multiple turns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is not "ask the model once and hope it gets the answer right." It allows the model to gather information incrementally across turns, invoke tools, and course-correct its judgments.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;turnCount&lt;/code&gt; also serves as a guard against infinite loops, enables logging statistics, and triggers degradation strategies. A mature agent must know how long it has been spinning, or it can easily get stuck circling a failure path.&lt;/p&gt;

&lt;p&gt;So a mature agent must have turn counts, budgets, and exit conditions. Without these boundaries, a multi-turn loop easily turns into running in circles.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;shouldAutoCompact&lt;/code&gt;: Context Swells — Compaction Must Be Part of the Main Loop
&lt;/h3&gt;

&lt;p&gt;Once an agent starts invoking tools, &lt;code&gt;messages&lt;/code&gt; grows rapidly.&lt;/p&gt;

&lt;p&gt;Reading a large file, running a test, searching for a batch of results — all of these dump huge amounts of information back into message history. Short tasks are fine, but long tasks will slam into the context window very quickly.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;shouldAutoCompact&lt;/code&gt; is not a nice-to-have optimization — it is a mandatory capacity-governance signal for any long-running agent.&lt;/p&gt;

&lt;p&gt;It answers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is the current message history too long?
Should older content be compressed into a summary?
Has compaction been failing consecutively?
How has the message volume changed before and after compaction?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice in the reference diagram why "check if compaction is needed" comes immediately after "append tool results to messages."&lt;/p&gt;

&lt;p&gt;Because tool results are precisely what causes context to swell.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;aborted&lt;/code&gt;: An Agent Must Also Be Safely Interruptible
&lt;/h3&gt;

&lt;p&gt;Real engineering tasks don't always end gracefully.&lt;/p&gt;

&lt;p&gt;A user might cancel, a command might get stuck, a tool might time out, a permission might be denied.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;aborted&lt;/code&gt; signals that this loop can be interrupted externally. It's a reminder that an agent's main loop must account not only for "how to start" and "how to succeed," but also for "how to stop."&lt;/p&gt;

&lt;p&gt;An agent that can't be safely stopped becomes more dangerous the more capable it gets.&lt;/p&gt;

&lt;p&gt;The more capable an agent is, the more it needs the ability to be halted cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The QueryEngine Perspective: It Manages a Session, Not a Single Request
&lt;/h2&gt;

&lt;p&gt;At this point, we've seen how one round of the ReAct state machine works inside &lt;code&gt;query.ts&lt;/code&gt;. But reading the source code requires stepping one layer further out: who holds the long-lived state that this loop depends on?&lt;/p&gt;

&lt;p&gt;The answer is &lt;code&gt;QueryEngine&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One useful way to read the source is to treat &lt;code&gt;QueryEngine&lt;/code&gt; at the conversation level. That framing matters because &lt;code&gt;QueryEngine&lt;/code&gt; is not a one-shot request handler — it is a session object.&lt;/p&gt;

&lt;p&gt;A single-request handler typically cares about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What's the input?
What should I return?
Is this call finished?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A session-level orchestrator, however, cares about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How do I keep appending to the message history?
Which permissions were previously denied?
Which files have already been read?
What's the current-round and cumulative usage?
Which skills have been discovered?
Which memory paths have been loaded?
Is the current task interrupted?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's why &lt;code&gt;QueryEngine&lt;/code&gt; surfaces a lot of cross-round state, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ConversationRuntimeState&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="na"&gt;abortController&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortController&lt;/span&gt;
  &lt;span class="na"&gt;permissionDenials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PermissionDenial&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="na"&gt;totalUsage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Usage&lt;/span&gt;
  &lt;span class="na"&gt;readFileCache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FileStateCache&lt;/span&gt;
  &lt;span class="na"&gt;discoveredSkills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="na"&gt;loadedMemoryPaths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These fields show it's not a thin wrapper that "forwards the prompt to the model." It's maintaining the live context of a conversation.&lt;/p&gt;

&lt;p&gt;The relationship between the two can be understood like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QueryEngine: session-level runtime, responsible for holding long-lived resources and state.
query.ts loop: task execution engine, responsible for building a Query round by round,
               calling the model, running tools, and appending messages.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;State&lt;/code&gt; is more like a working snapshot of a single loop iteration; &lt;code&gt;QueryEngine&lt;/code&gt; is more like the scheduling center behind the session.&lt;/p&gt;

&lt;p&gt;With this perspective in place, ReAct is no longer just a small loop of "should the model keep calling tools." It's part of a complete task lifecycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. &lt;code&gt;submitMessage()&lt;/code&gt;: The Real Entry Point That Starts an Agent Run
&lt;/h2&gt;

&lt;p&gt;Following the trail from a user action, whenever a user submits a message, the real entry point typically lands on a method like &lt;code&gt;submitMessage()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Unlike a typical backend endpoint that receives nothing more than a &lt;code&gt;prompt&lt;/code&gt;, this method reads and prepares an entire set of runtime resources at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current cwd
Available tools
Slash commands
MCP clients
Thinking configuration (whether extended reasoning mode is enabled)
Max turns
Budget limits
Session persistence state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So &lt;code&gt;submitMessage()&lt;/code&gt; is not fundamentally "fire off a chat request." It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Launch an agent run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Over the course of that run, it has to handle roughly the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read current config and session state
Set up the working directory and session environment
Wrap the tool permission determination logic
Prepare the system prompt and context
Invoke the underlying query loop
Handle tool calls as the model produces output
Write tool results back into session history
Track usage, cost, and boundary conditions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ReAct loop inside &lt;code&gt;query.ts&lt;/code&gt; is just the inner kernel of "how the task moves forward"; &lt;code&gt;submitMessage()&lt;/code&gt; and &lt;code&gt;QueryEngine&lt;/code&gt; are what put that kernel into a real Claude Code session and actually run it.&lt;/p&gt;

&lt;p&gt;This is also where Claude Code is more engineered than a minimal agent demo. A demo usually only proves that "the model can call tools," but &lt;code&gt;QueryEngine&lt;/code&gt; has to guarantee:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is this tool call actually allowed?
Can the result feed back into the next round of model input?
Can it recover on failure?
Will state get corrupted across a long-running session?
Will context or budget spiral out of control?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real agent engineering lives in these places that don't look flashy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue2jwrrazxhjmfkmgvhx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue2jwrrazxhjmfkmgvhx.png" alt="ReAct Sketch 3: submitMessage Launches an Agent Run" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Translating the Right-Hand Flow into Code: What Actually Happens Inside the While Loop?
&lt;/h2&gt;

&lt;p&gt;The right side of the diagram translates into a simplified pseudocode snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aborted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildQuery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;requestModelAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseModelResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hasToolUse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;finalAnswer&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolUses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolUseContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;appendToolResultsToMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;toolResults&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;maybeAutoCompact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nextTurn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are three key points in this pseudocode.&lt;/p&gt;

&lt;p&gt;First, &lt;code&gt;buildQuery(state)&lt;/code&gt; is not simply concatenating the user's question. It constructs the model input for the current turn based on the current &lt;code&gt;State&lt;/code&gt;, including message history, system prompt, available tools, context summaries, and so on.&lt;/p&gt;

&lt;p&gt;Second, the result returned by &lt;code&gt;requestModelAPI(query)&lt;/code&gt; is not necessarily the final answer. It could be text, or it could contain a tool invocation request.&lt;/p&gt;

&lt;p&gt;Third, the loop only ends when the model no longer requests tools. As long as the model still needs tools, Claude Code will keep executing them, feeding results back, and advancing to the next turn.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;while(true)&lt;/code&gt; isn't a mindless infinite loop.&lt;/p&gt;

&lt;p&gt;The real exit conditions are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model no longer requests tools
or the task is interrupted
or an engineering limit, error, or permission block is triggered
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the heartbeat of the agent loop.&lt;/p&gt;

&lt;p&gt;(When reading the source, set breakpoints on these three functions: &lt;code&gt;buildQuery&lt;/code&gt;, &lt;code&gt;parseModelResponse&lt;/code&gt;, &lt;code&gt;maybeAutoCompact&lt;/code&gt;. They map to three core questions: how input is assembled, how output is interpreted, and how state is governed. If those three are clear, the rest of the file becomes much easier to follow.)&lt;/p&gt;

&lt;h2&gt;
  
  
  7. "Has Tool Calls?" — the critical fork in the whole machine
&lt;/h2&gt;

&lt;p&gt;Refer back to the diamond in the diagram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Has tool calls?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple as it looks, this step defines the semantics of the current turn.&lt;/p&gt;

&lt;p&gt;No tool calls means the model considers the available information sufficient and can deliver a final answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;no -&amp;gt; break -&amp;gt; return result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Has tool calls means the model believes the information is still incomplete and it needs to go gather evidence from the outside world:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;yes -&amp;gt; invoke tools -&amp;gt; write back to messages -&amp;gt; another round
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tool calls are not an add-on feature. They are the switch that moves Claude Code from answer mode into action mode.&lt;/p&gt;

&lt;p&gt;A conventional chatbot usually stops at the first case: generate text and done.&lt;/p&gt;

&lt;p&gt;An agent, on the other hand, must support the second case: the model admits it doesn't yet know and fills in the gaps through tools.&lt;/p&gt;

&lt;p&gt;This is also the core of ReAct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reason: the model judges the next step based on current context
Act:    the model issues a tool-call intent
Observe: the tool result is written back into messages
Reason: the model continues judging based on the new observation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Round after round, it spins through this cycle — and only then does the system come across as something that "gets things done."&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Why Must Tool Results Be Appended to Messages?
&lt;/h2&gt;

&lt;p&gt;After a tool executes, the most critical step isn't "getting the result" — it's:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing the result back into the message stream.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, the model requests reading &lt;code&gt;package.json&lt;/code&gt;. The tool does read the file contents, but if that result isn't appended to &lt;code&gt;messages&lt;/code&gt;, the model in the next turn has no way to see it.&lt;/p&gt;

&lt;p&gt;This creates a bizarre disconnect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model: "I need to read package.json"
System reads package.json
Model (next turn): "I still don't know what's in package.json"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Appending tool results to &lt;code&gt;messages&lt;/code&gt; is fundamentally about completing the &lt;code&gt;Observation&lt;/code&gt; step in ReAct.&lt;/p&gt;

&lt;p&gt;It translates a fact from the external world back into context the model can consume.&lt;/p&gt;

&lt;p&gt;Another way to think about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool calls let the model touch the real world.
Appending to messages lets the model remember what it just touched.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the former, the model can only guess.&lt;/p&gt;

&lt;p&gt;Without the latter, the model suffers amnesia after every action.&lt;/p&gt;

&lt;p&gt;Plenty of minimal agent demos appear to invoke tools, yet they fail on longer tasks for exactly this reason: they have &lt;code&gt;Act&lt;/code&gt;, but they do not have a reliable &lt;code&gt;Observe -&amp;gt; write-back -&amp;gt; next-round Reason&lt;/code&gt; loop.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduppf6rj63en0vih8er3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduppf6rj63en0vih8er3.png" alt="ReAct Sketch 4: Observation Write-Back and Compaction Check" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Why Is Compaction Placed After Tool Write-Back?
&lt;/h2&gt;

&lt;p&gt;The final step in the reference diagram is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Check if compression is needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it sits right after "Append tool results to messages."&lt;/p&gt;

&lt;p&gt;This ordering matters a great deal.&lt;/p&gt;

&lt;p&gt;Tool results are often the primary source of context bloat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read a file     → may return hundreds of lines of code
Run a test      → may return a long log dump
Search code     → may return dozens of hit locations
Call an external service → may return a large structured JSON
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If every one of these results gets fed verbatim into the next round of model input, long tasks quickly become expensive, slow, and prone to losing focus.&lt;/p&gt;

&lt;p&gt;So Claude Code has to keep asking one question inside the main loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Can the current messages still be carried forward as-is?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If not, compression kicks in.&lt;/p&gt;

&lt;p&gt;Compression doesn't mean casually discarding content — it means preserving the information that remains useful for downstream tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What is the user's goal?
What has already been tried?
Which files have been read?
Which commands have been run?
Which errors are still unresolved?
What should the next step focus on?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-compression is not a "token-saving trick" — it is the infrastructure that makes long-running Agents possible in the first place.&lt;/p&gt;

&lt;p&gt;Without compression, the harder the ReAct loop works, the more the message history spirals out of control.&lt;/p&gt;

&lt;p&gt;The compaction strategy is a strong signal of agent-engineering maturity: crude truncation risks dropping critical information, while over-compression can make the model "forget" what it has already done. We'll unpack this in more detail when we get to context management.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. From a Source-Reading Perspective, How Do You Trace This Main Thread?
&lt;/h2&gt;

&lt;p&gt;Read &lt;code&gt;query.ts&lt;/code&gt;, but don't jump straight into the branches.&lt;/p&gt;

&lt;p&gt;A better approach is to first get a handle on these 8 questions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Where is the QueryEngine created?
2. How does submitMessage kick off an agent run?
3. Where is State created?
4. What does buildQuery pull from State?
5. After the Model API returns, how does the code detect tool use?
6. Where are tool calls actually executed?
7. How are tool results appended back into messages?
8. When does compaction trigger?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once these 8 questions connect, the main relationships between &lt;code&gt;query.ts&lt;/code&gt; and &lt;code&gt;QueryEngine&lt;/code&gt; become clear.&lt;/p&gt;

&lt;p&gt;If you want to go deeper, you can pin these questions to a few more concrete source anchors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QueryEngine.ts
-&amp;gt; Find submitMessage: how user input enters a turn

query.ts
-&amp;gt; Find QueryParams: what inputs a query round needs
-&amp;gt; Find State: what state is preserved between loops
-&amp;gt; Find queryLoop: where messagesForQuery is assembled each round
-&amp;gt; Find tool_use collection: how model output becomes a list of tool calls
-&amp;gt; Find the tool execution entry: how runTools / StreamingToolExecutor is chosen
-&amp;gt; Find tool_result write-back: how tool results merge into the next round's messages

services/tools/StreamingToolExecutor.ts
-&amp;gt; See how streaming tool execution and concurrency safety work together

services/tools/toolOrchestration.ts
-&amp;gt; See how batched tool calls are grouped by isConcurrencySafe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Behind these anchors lies the same engineering pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;messagesForQuery
-&amp;gt; model stream
-&amp;gt; assistantMessages + toolUseBlocks
-&amp;gt; toolResults
-&amp;gt; next State.messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The many branches in the source still trace back to this same pipeline. Prompt-too-long recovery, max-output-tokens recovery, stop-hook blocking, compaction, memory prefetch, and skill discovery are all, at bottom, answering the same question: if this round does not complete cleanly, how should the next round's &lt;code&gt;State&lt;/code&gt; be constructed?&lt;/p&gt;

&lt;p&gt;You'll find that what this file really wants to convey isn't "some particular function is incredibly complex," but rather a very stable engineering pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;State
-&amp;gt; Query
-&amp;gt; Model Response
-&amp;gt; Tool Use?
-&amp;gt; Tool Result
-&amp;gt; Updated State
-&amp;gt; Next Query
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you understand this pipeline, everything else — Tools, Context, Prompt, Memory, Permission — can be mapped back onto it.&lt;/p&gt;

&lt;p&gt;Tools are the action layer.&lt;/p&gt;

&lt;p&gt;Context is the material organization for each round's Query.&lt;/p&gt;

&lt;p&gt;Prompt is the set of rules telling the model how to decide and act.&lt;/p&gt;

&lt;p&gt;Permission is the brake that sits before every action.&lt;/p&gt;

&lt;p&gt;Compact is capacity governance for long-running tasks.&lt;/p&gt;

&lt;p&gt;And &lt;code&gt;query.ts&lt;/code&gt;'s ReAct state machine is the backbone that threads all of these capabilities together.&lt;/p&gt;

&lt;h2&gt;
  
  
  11. Redraw the Reference Diagram as a Mermaid Flow
&lt;/h2&gt;

&lt;p&gt;You can compress the entire diagram into the following flow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fts0xpcdz3u1pzyq7ksqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fts0xpcdz3u1pzyq7ksqk.png" alt="ReAct Figure 3" width="800" height="1700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The two loops in this diagram are the most important thing to take away.&lt;/p&gt;

&lt;p&gt;The first is the ReAct loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reason -&amp;gt; Act -&amp;gt; Observe -&amp;gt; Reason
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second is the engineering state loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QueryEngine -&amp;gt; State -&amp;gt; Query -&amp;gt; Response -&amp;gt; Tool Result -&amp;gt; State -&amp;gt; QueryEngine
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The former explains why an agent seems to think while doing.&lt;/p&gt;

&lt;p&gt;The latter explains why the source code must include &lt;code&gt;QueryEngine&lt;/code&gt;, &lt;code&gt;State&lt;/code&gt;, &lt;code&gt;messages&lt;/code&gt;, &lt;code&gt;toolUseContext&lt;/code&gt;, &lt;code&gt;turnCount&lt;/code&gt;, &lt;code&gt;autoCompactTracking&lt;/code&gt;, &lt;code&gt;permissionDenials&lt;/code&gt;, and &lt;code&gt;totalUsage&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  12. One-Sentence Summary
&lt;/h2&gt;

&lt;p&gt;The ReAct mechanism in &lt;code&gt;query.ts&lt;/code&gt; is, at its core, maintaining a continuously evolving &lt;code&gt;State&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In each cycle, Claude Code builds a Query from the current &lt;code&gt;State&lt;/code&gt;, calls the model API, and parses whether the model wants to invoke a tool. If the model no longer needs a tool, it returns the final result. If the model needs a tool, the system executes it, appends the result to &lt;code&gt;messages&lt;/code&gt;, checks whether compression is necessary, and then enters the next cycle with the updated &lt;code&gt;State&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Surrounding this loop, &lt;code&gt;QueryEngine&lt;/code&gt; holds session-level state and organizes tools, permissions, context, budget, caching, and interrupt control into a complete task runtime.&lt;/p&gt;

&lt;p&gt;So Claude Code is not a "model answers once" program. It is a state-driven agent runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model decides the next step.
Tools touch the real world.
messages bring the real world back to the model.
Compression keeps long tasks running.
State organizes all of this into a sustainable loop.
QueryEngine places this loop inside a session-level runtime.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you understand this ReAct loop, Prompts, Tools, Context management, and multi-agent collaboration stop looking like scattered modules.&lt;/p&gt;

&lt;p&gt;They all serve the same purpose:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabling the model not just to speak, but to get things done step by step in the engineering world.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>claude</category>
    </item>
    <item>
      <title>Claude Code Source Analysis Series, Chapter 1: Architecture</title>
      <dc:creator>LienJack</dc:creator>
      <pubDate>Sun, 10 May 2026 05:53:11 +0000</pubDate>
      <link>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-1-architecture-48d7</link>
      <guid>https://dev.to/lien_jp_db54b8b7fd9fa0118/claude-code-source-analysis-series-chapter-1-architecture-48d7</guid>
      <description>&lt;h1&gt;
  
  
  Claude Code Source Analysis Series, Chapter 1: Engineering Architecture
&lt;/h1&gt;

&lt;p&gt;When most people first encounter Claude Code, they mentally file it as "a chat box that can write code."&lt;/p&gt;

&lt;p&gt;That's not wrong, but it misses the point. What makes Claude Code truly powerful isn't just that the model can answer coding questions. Wrapped around that model is an entire engineering system: it reads your project, invokes tools, maintains context, manages state, connects to MCP, dispatches sub-agents, and enforces permission and security boundaries.&lt;/p&gt;

&lt;p&gt;So rather than diving straight into a specific function in the source, this chapter starts by answering a bigger question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What kind of engineering architecture is Claude Code, exactly?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here it is in one sentence:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code = Model API + QueryEngine main loop + Tools system + Context/State management + Security governance + Agent collaboration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model provides the core reasoning capability. What turns it into a "programming agent that gets things done" is the entire runtime layer wrapped around it.&lt;/p&gt;

&lt;p&gt;To make this concrete, we'll approach it through three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Functional architecture: What capability layers does it have?&lt;/li&gt;
&lt;li&gt;Runtime architecture: How does a user's prompt flow through the system?&lt;/li&gt;
&lt;li&gt;Code architecture: How is the source roughly organized by module?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These three questions build on each other: first understand what capabilities Claude Code has, then see how the QueryEngine orchestrates them, and finally map them back to the modules in the source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Why You Can't Just Hook Up a Model API
&lt;/h2&gt;

&lt;p&gt;Suppose you build the simplest possible AI coding assistant. The flow would look something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User asks a question
-&amp;gt; Backend forwards the question to the LLM
-&amp;gt; LLM returns an answer
-&amp;gt; Display the answer to the user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's barely adequate for "explain this piece of code." But the moment the user says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Look at this project and figure out why the tests are failing, then fix them.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Things get complicated fast.&lt;/p&gt;

&lt;p&gt;The model needs to understand the project structure, know what files exist, know how to run the test command, know where the error logs live, and know which file to change. After making changes, it needs to re-run the tests to verify. If it hits a permission error, a failed command, an overflowing context window, or an oversized tool output along the way, it needs to recover.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models can think — but they can't touch a real engineering environment on their own.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A model doesn't natively read files. It doesn't natively execute shell commands. It doesn't natively maintain long-running task state. And it doesn't natively know which operations are dangerous. So Claude Code has to wrap a layer around the Model API — an "engineering shell."&lt;/p&gt;

&lt;p&gt;That engineering shell is the core value of Claude Code.&lt;/p&gt;

&lt;p&gt;This is exactly where many open-source agent projects stall: the model-calling layer looks great, but the engineering shell leaks the moment anything pushes back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfp4dkl3j9281uva8kuj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfp4dkl3j9281uva8kuj.png" alt="Engineering Architecture Sketch 1: The Engineering Shell Around the Model API" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Functional Architecture: What Capability Layers Does Claude Code Have?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr405ok2jzpjpmwxmc51z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr405ok2jzpjpmwxmc51z.png" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Viewed through its functional architecture, Claude Code resembles a layered Agent Runtime — an agent execution environment built around the model, responsible for dispatching tools, managing state, and advancing tasks.&lt;/p&gt;

&lt;p&gt;At the innermost layer sits the &lt;code&gt;Model API&lt;/code&gt;. This is the reasoning core, responsible for understanding tasks, generating responses, and deciding whether the next step requires calling a tool. But it is only the "brain," not the complete system.&lt;/p&gt;

&lt;p&gt;Wrapped around the model is the first runtime layer: the &lt;code&gt;QueryEngine&lt;/code&gt; — the query engine that turns a single user input into a continuously running agent main loop. Without the QueryEngine, Claude Code would be nothing more than a plain API wrapper. With it, it becomes a runtime capable of driving tasks forward on its own.&lt;/p&gt;

&lt;p&gt;The next layer outward is the &lt;code&gt;Tools&lt;/code&gt; system. This layer gives the model "hands and feet": file reads and writes, shell commands, search, web access, MCP, LSP, Agent tools, and Skills all belong here.&lt;/p&gt;

&lt;p&gt;Beyond that lies &lt;code&gt;Context / Memory / State&lt;/code&gt;. This layer answers the question: "What exactly should the model know for this turn?" It dynamically assembles the system prompt, user input, project rules, message history, tool results, file caches, compression summaries, and current application state.&lt;/p&gt;

&lt;p&gt;Farther out still is &lt;code&gt;Agent Collaboration&lt;/code&gt;. When tasks become complex, Claude Code does more than converse with the model in a single thread — it can decompose subtasks to subordinate Agents or Tasks. The main Agent handles the overall judgment; child Agents handle code search, solution exploration, or hypothesis validation.&lt;/p&gt;

&lt;p&gt;At the outermost layer, underpinning every capability, is security governance. Because Claude Code operates in real engineering environments, it can read proprietary code, execute commands, modify files, and invoke external services. Without permissions, policies, sandboxing, prompt injection defenses, and audit logging, a more powerful agent only means greater risk.&lt;/p&gt;

&lt;p&gt;The following diagram captures this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlq0lzogx3magb4fnnb0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frlq0lzogx3magb4fnnb0.png" alt="Engineering Architecture Figure 1" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core message this diagram aims to convey is not "Claude Code has many modules," but rather:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code's capabilities do not grow directly out of the model. They grow out of the engineering systems layered one by one around the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvldpr0o6rhw6polfk3bf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvldpr0o6rhw6polfk3bf.png" alt="Engineering Architecture Sketch 2: Claude Code's Capability Layers" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Model API: Responsible for Judgment, Not Execution
&lt;/h3&gt;

&lt;p&gt;Let's clarify the most easily confused point first: the &lt;code&gt;Model API&lt;/code&gt; does not directly execute any tools.&lt;/p&gt;

&lt;p&gt;What the model actually produces is intent, roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need to read a certain file.
I need to search for a certain keyword.
I need to run a test command.
I need to modify a certain piece of code.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the actions — "read the file," "execute the command," "modify the code" — are all carried out by the Claude Code host program.&lt;/p&gt;

&lt;p&gt;The division of labor is clear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model handles understanding, planning, and choosing.
The program handles execution, constraints, and recording.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you imagine every capability as the model's own magic, you'll miss where Claude Code's real value lies. The parts actually worth studying are precisely those that are "not intelligent but deeply engineered": tool contracts, permission systems, state management, context compression, error recovery, UI rendering, and session recording.&lt;/p&gt;

&lt;h3&gt;
  
  
  QueryEngine: The Heartbeat of the Entire System
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;QueryEngine&lt;/code&gt; is Claude Code's main loop.&lt;/p&gt;

&lt;p&gt;Its role is not simply "sending requests to the model." It manages an entire session lifecycle. A session contains multiple rounds of user input, multiple rounds of model responses, multiple tool calls, and multiple state changes. The QueryEngine must string all of these together.&lt;/p&gt;

&lt;p&gt;The state it maintains includes at minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Current message history&lt;/li&gt;
&lt;li&gt;Current working directory&lt;/li&gt;
&lt;li&gt;Currently available tool set&lt;/li&gt;
&lt;li&gt;Current model and budget&lt;/li&gt;
&lt;li&gt;File read cache&lt;/li&gt;
&lt;li&gt;Permission denial records&lt;/li&gt;
&lt;li&gt;Skill discovery records&lt;/li&gt;
&lt;li&gt;Token usage count&lt;/li&gt;
&lt;li&gt;Session transcript&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, this state determines what Claude Code should do next.&lt;/p&gt;

&lt;p&gt;(The implementation details of the QueryEngine will be covered in the next article, but it is fundamentally a state machine: each turn decides what to do based on the current state, executes it, then updates the state.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools System: The Model's Hands and Feet — But Always Under Control
&lt;/h3&gt;

&lt;p&gt;Claude Code's tool system can be understood as a unified capability marketplace.&lt;/p&gt;

&lt;p&gt;It includes basic tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read / Write / Edit / Grep / Glob / Bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As well as extended capabilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP / LSP / Web / Agent / Skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most important thing about the tool system is not "many tools," but that they are all placed inside a unified tool contract. Every tool must answer a set of questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What is this tool called?&lt;/li&gt;
&lt;li&gt;What are its input parameters?&lt;/li&gt;
&lt;li&gt;How are inputs validated?&lt;/li&gt;
&lt;li&gt;How is it executed?&lt;/li&gt;
&lt;li&gt;How is output converted back into a message?&lt;/li&gt;
&lt;li&gt;Is it read-only?&lt;/li&gt;
&lt;li&gt;Is it destructive?&lt;/li&gt;
&lt;li&gt;Is concurrent execution allowed?&lt;/li&gt;
&lt;li&gt;Does it require user confirmation?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what makes Claude Code more engineered than "letting the model write shell commands on its own."&lt;/p&gt;

&lt;p&gt;Take viewing a file, for example. Letting the model directly run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;src/main.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;would certainly work, but the system would have no way of knowing the real semantics of that action. It's just a shell string.&lt;/p&gt;

&lt;p&gt;By going through the &lt;code&gt;Read&lt;/code&gt; tool instead, Claude Code can know:&lt;/p&gt;

&lt;p&gt;This is a read operation.&lt;br&gt;
What is the target path?&lt;br&gt;
Is access authorized?&lt;br&gt;
Is the output too large?&lt;br&gt;
Should it be truncated?&lt;br&gt;
Should it enter the file cache?&lt;br&gt;
Will a subsequent Edit be based on the latest version?&lt;/p&gt;

&lt;p&gt;This is the value of tool abstraction:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools exist not to let the model "do more," but to make the model's actions understandable, constrainable, and auditable.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Context / Memory / State: Giving the Model What It Needs to Know
&lt;/h3&gt;

&lt;p&gt;Another easily underestimated capability of Claude Code is context engineering.&lt;/p&gt;

&lt;p&gt;Many people hear "context" and assume it means "writing longer prompts." But in Claude Code, context is not a static block of text — it is a runtime input dynamically assembled anew for every turn.&lt;/p&gt;

&lt;p&gt;It may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The base system prompt&lt;/li&gt;
&lt;li&gt;The current user input&lt;/li&gt;
&lt;li&gt;Conversation history&lt;/li&gt;
&lt;li&gt;Project-level rules&lt;/li&gt;
&lt;li&gt;User-level rules&lt;/li&gt;
&lt;li&gt;The current working directory&lt;/li&gt;
&lt;li&gt;Available tool descriptions&lt;/li&gt;
&lt;li&gt;External capabilities exposed via MCP / LSP&lt;/li&gt;
&lt;li&gt;Skill instructions&lt;/li&gt;
&lt;li&gt;File read results&lt;/li&gt;
&lt;li&gt;The result of the last tool execution&lt;/li&gt;
&lt;li&gt;A compressed summary of history&lt;/li&gt;
&lt;li&gt;The current AppState&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two problems here are genuinely hard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first is "what to give."&lt;/strong&gt; Give too little, and the model lacks context; give too much, and the context explodes, sending cost and latency out of control.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second is "when to give it."&lt;/strong&gt; Some information should enter the system prompt right from the start; some should be fetched via tools only when the model actually needs it; some tool schemas can also be discovered lazily rather than crammed in all at once.&lt;/p&gt;

&lt;p&gt;Put bluntly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Engineering is not prompt writing — it is context scheduling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Memory / Compression&lt;/code&gt; addresses the long-task problem. Real engineering tasks routinely cycle through searching code, reading files, running tests, analyzing errors, modifying code, and running tests again. Every step produces messages and tool results. If you stuff all of them back into the model verbatim, the context quickly becomes long and noisy.&lt;/p&gt;

&lt;p&gt;The value of the compression mechanism is not simply saving tokens — it is keeping the Agent on track through long-running tasks.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;AppStateStore&lt;/code&gt;, meanwhile, unifies CLI UI state, session state, tool state, and Agent state. Is the current session in Plan Mode? What MCP tools are currently available? What is the current permission mode? Is there a background task running right now? None of these can be resolved by model messages alone — they require an application state system.&lt;/p&gt;
&lt;h3&gt;
  
  
  MCP / LSP / Skills: The Capability Integration Layer
&lt;/h3&gt;

&lt;p&gt;Claude Code cannot bake every capability directly into the main program, so it needs an extension mechanism.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MCP&lt;/code&gt; (Model Context Protocol — a protocol that lets external tools and resources be called by the Agent in a standardized way) functions more like an external tool protocol. It lets Claude Code discover and invoke tools and resources provided by external services. Databases, browsers, design tools, internal systems — all can become Agent-callable capabilities through MCP.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;LSP&lt;/code&gt; (Language Server Protocol — provides code-semantic capabilities like symbols, definitions, and references) leans more toward code intelligence. It helps Claude Code better understand the programming language itself.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Skills&lt;/code&gt; are closer to reusable task-method bundles. They are typically not a single API but a set of instructions, scripts, templates, and trigger rules that tell the Agent how to handle a certain class of task.&lt;/p&gt;

&lt;p&gt;These three solve different problems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP: How to standardize integration of external capabilities.
LSP: How to integrate code-semantic capabilities.
Skills: How to load reusable working methods.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Together they form Claude Code's extension layer.&lt;/p&gt;

&lt;p&gt;(A practical pitfall during integration: if an MCP tool's schema is too large, it will directly blow up the context. Claude Code's approach is lazy discovery — not stuffing everything in all at once.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Governance: The More Capable the Agent, the More It Needs Boundaries
&lt;/h3&gt;

&lt;p&gt;The security layer is not decoration — it is the prerequisite for Claude Code to exist as an engineering tool at all.&lt;/p&gt;

&lt;p&gt;The security layer must address roughly four categories of problems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Permissions &amp;amp; Policy: Which tools are available, which paths are accessible, which commands require confirmation.
Sandboxing: Confining dangerous actions to a controlled environment.
Prompt Injection Prevention: Preventing project files or external content from inducing the model to act without authorization.
Audit Logs: Recording what the model did, what tools executed, and what the user approved.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most critical design principle here is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model can suggest actions, but it cannot bypass system boundaries to act directly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the model outputs a &lt;code&gt;tool_use&lt;/code&gt; (the behavior where the model requests to invoke a tool via a specific format), it is only initiating a request. Before actual execution, it must still pass through the tool system and the permission system.&lt;/p&gt;

&lt;p&gt;This is also the watershed between Claude Code and many toy Agents: toy Agents pursue "getting it to run"; production-grade Agents must pursue "getting it to run within constraints."&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Runtime Architecture: How Does a Single User Sentence Flow Through the System?
&lt;/h2&gt;

&lt;p&gt;After understanding the functional layer, let's look at the runtime architecture.&lt;/p&gt;

&lt;p&gt;From the user's perspective, the process boils down to one sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Help me fix this bug
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But inside Claude Code, that sentence isn't sent directly to the model. It first enters a runtime orchestrated by the QueryEngine.&lt;/p&gt;

&lt;p&gt;A simplified runtime flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User input
-&amp;gt; Claude Code session
-&amp;gt; QueryEngine.submitMessage()
-&amp;gt; Process user input and slash commands
-&amp;gt; Build context and system prompt
-&amp;gt; Call Model API
-&amp;gt; Model returns text or tool_use
-&amp;gt; Tool system checks permissions and executes tools
-&amp;gt; Tool results written back to message history
-&amp;gt; QueryEngine continues to the next round
-&amp;gt; Until the task is complete or user decision is needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvby0kpc1kx5tt6wljo8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvby0kpc1kx5tt6wljo8c.png" alt="Engineering Architecture Sketch 3: User Input Flow in the Runtime" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Drawn as a sequence diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzbfii7zr6l0xjnob4yh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzbfii7zr6l0xjnob4yh.png" alt="Engineering Architecture Figure 2" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are two feedback loops in this diagram.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first loop is the cycle between the model and tools:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model determines the next step
-&amp;gt; Tool executes a real action
-&amp;gt; Tool result goes back to the model
-&amp;gt; Model continues reasoning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what enables Claude Code to push tasks forward continuously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second loop is the cycle between context and state:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Each round changes the message history, tool results, permission state, task state
-&amp;gt; The next round, QueryEngine reassembles context based on those changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is why Claude Code isn't just "one question, one answer." It behaves more like a continuously running state machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slash Commands Don't Always Hit the Model
&lt;/h3&gt;

&lt;p&gt;There's another detail in the runtime architecture: not all user input triggers a Model API call.&lt;/p&gt;

&lt;p&gt;Some inputs are local commands — configuration, cleanup, compression, status viewing. If these were forced through the model, they'd waste tokens and introduce instability.&lt;/p&gt;

&lt;p&gt;So when QueryEngine processes user input, it first determines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is this a task that requires model reasoning?
Or is this a command that can be executed locally?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it's a local command, the system returns the result directly and ends the round early.&lt;/p&gt;

&lt;p&gt;(This is a pragmatic design choice. If typing &lt;code&gt;/clear&lt;/code&gt; to clear the screen still required a round trip to the model, the experience would be terrible.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Plan Mode: Slowing Down Execution First
&lt;/h3&gt;

&lt;p&gt;The runtime architecture also includes an important mode: &lt;code&gt;Plan Mode&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For a typical chat product, the model can just respond directly. But for a coding agent, "acting immediately" carries risk — it might modify files, run commands, and affect the project state.&lt;/p&gt;

&lt;p&gt;The point of Plan Mode is to split a task into two phases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;First: understand and plan
Then: execute and modify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this reflects is Claude Code's design philosophy around control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Not every task should be executed immediately.&lt;/li&gt;
&lt;li&gt;Not every tool should be open by default.&lt;/li&gt;
&lt;li&gt;The user should be able to see the plan at key decision points and decide whether to proceed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A mature agent system doesn't blindly pursue "maximum automation." The real challenge is finding the balance between automation and controllability.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Code Architecture: How Is the Source Code Organized?
&lt;/h2&gt;

&lt;p&gt;Let's return to code architecture at the end.&lt;/p&gt;

&lt;p&gt;If functional architecture answers "what capabilities does Claude Code have," and runtime architecture answers "how do those capabilities run together," then code architecture answers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you open the source code, what mental map should you build first?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a trick for reading the source: don't scan the directory tree horizontally. Claude Code has many source directories — &lt;code&gt;components&lt;/code&gt;, &lt;code&gt;services&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, &lt;code&gt;hooks&lt;/code&gt;, &lt;code&gt;utils&lt;/code&gt; — and it's easy to get lost. A more reliable approach is to first identify the &lt;em&gt;load-bearing chain&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Entry point hands user input to a session
→ QueryEngine manages a conversation
→ query.ts drives rounds of ReAct
→ The Tool protocol turns model intent into executable requests
→ Context / Prompt determine what the model sees each round
→ Permission / Hooks / State determine whether an action can land
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlo7nqmfk1xzemrsdxfq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlo7nqmfk1xzemrsdxfq.png" alt="Engineering Architecture Sketch 4: The Load-Bearing Chain for Reading the Source" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In other words, this section isn't listing directories — it's locating the source-code coordinates for the articles that follow.&lt;/p&gt;

&lt;p&gt;Start by building an overall mental picture with this diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bjmyxthi13vrcq911w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn4bjmyxthi13vrcq911w.png" alt="Engineering Architecture Figure 3" width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This code architecture diagram can be read in layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Entry Layer: cli.tsx and main.tsx
&lt;/h3&gt;

&lt;p&gt;At the top are &lt;code&gt;cli.tsx&lt;/code&gt; and &lt;code&gt;main.tsx&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cli.tsx&lt;/code&gt; is the actual binary entry point. It handles fast-path options like &lt;code&gt;--version&lt;/code&gt; that don't require loading the full application. The goal is to make the CLI tool start as quickly as possible.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;main.tsx&lt;/code&gt; enters the full startup flow, responsible for command-line arguments, configuration, environment variables, preloading, and mode dispatch. It routes the program into different runtime paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Interactive REPL
Headless / SDK
MCP service
Other command paths
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Code isn't just the "terminal chat" form factor. The REPL, SDK, and MCP service can all share the same underlying capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interaction Layer: REPL and Terminal UI
&lt;/h3&gt;

&lt;p&gt;Claude Code is a CLI product. The terminal UI is not an afterthought.&lt;/p&gt;

&lt;p&gt;It has to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User input&lt;/li&gt;
&lt;li&gt;Streaming output&lt;/li&gt;
&lt;li&gt;Tool execution progress&lt;/li&gt;
&lt;li&gt;Permission confirmations&lt;/li&gt;
&lt;li&gt;Error prompts&lt;/li&gt;
&lt;li&gt;Task status display&lt;/li&gt;
&lt;li&gt;Plan Mode interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why the source code contains a substantial amount of UI rendering and state subscription logic. The agent doesn't just run in the background — the user needs to understand what it's doing, right in the terminal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Orchestration Layer: QueryEngine and the query Main Loop
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;QueryEngine&lt;/code&gt; is the session-level orchestration layer.&lt;/p&gt;

&lt;p&gt;It connects upward to the REPL / SDK and downward to the query main loop, context system, state system, tool system, and command system.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;query&lt;/code&gt; main loop is more focused on model invocation and the ReAct Loop — an action-loop pattern where the model alternates between reasoning and executing actions. It's responsible for sending messages to the model, receiving model responses, identifying &lt;code&gt;tool_use&lt;/code&gt;, and placing tool execution results back into the message stream.&lt;/p&gt;

&lt;p&gt;A simple distinction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QueryEngine: manages an entire session.
query main loop: manages one or more model-tool cycles.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Capability Layer: Tools / Commands / Services
&lt;/h3&gt;

&lt;p&gt;The capability layer breaks down into three categories.&lt;/p&gt;

&lt;p&gt;The first is &lt;code&gt;Commands&lt;/code&gt;. It handles slash commands and local commands. Some user inputs don't require model reasoning — executing them locally is more reliable.&lt;/p&gt;

&lt;p&gt;The second is &lt;code&gt;Tools&lt;/code&gt;. It handles the tool capabilities the model can invoke: reading files, editing files, running shell commands, searching code, calling sub-agents.&lt;/p&gt;

&lt;p&gt;The third is &lt;code&gt;Services&lt;/code&gt;. It carries external integrations and extension capabilities: MCP, LSP, plugins, remote sessions.&lt;/p&gt;

&lt;p&gt;These three categories together form Claude Code's execution layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Layer: context, memory, compression
&lt;/h3&gt;

&lt;p&gt;The context layer answers one question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What exactly should be sent to the model this round?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's not just concatenating strings. It synthesizes the current task, conversation history, project rules, user rules, tool descriptions, MCP capabilities, skill descriptions, file read results, and compressed summaries.&lt;/p&gt;

&lt;p&gt;This is also why Claude Code seems to "understand your project": the model doesn't innately understand it — the context layer continuously assembles project-relevant information and feeds it to the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  State Layer: AppStateStore
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AppStateStore&lt;/code&gt; is responsible for global state.&lt;/p&gt;

&lt;p&gt;It manages more than just UI state; it also covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Current model configuration&lt;/li&gt;
&lt;li&gt;Tool permission context&lt;/li&gt;
&lt;li&gt;MCP clients and tools&lt;/li&gt;
&lt;li&gt;Plugin state&lt;/li&gt;
&lt;li&gt;Sub-agent / Task state&lt;/li&gt;
&lt;li&gt;Remote session state&lt;/li&gt;
&lt;li&gt;User settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without the state layer, it would be difficult for Claude Code to unify the "terminal application" and the "agent runtime."&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Layer: permissions, sandbox, audit
&lt;/h3&gt;

&lt;p&gt;The security layer doesn't map to a single file in the code. Instead, it runs through tool execution, permission decisions, command classification, MCP calls, user confirmations, and session recording.&lt;/p&gt;

&lt;p&gt;Its essence is turning the model's free-form intent into governed execution requests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The model says "here's what I want to do"
The system decides "are you allowed to do it"
The tool enforces "do it according to the rules"
The log records "here's what you did"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the difference between a production-grade agent and a demo agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load-Bearing Files in the Source: Read a Few Beams First
&lt;/h3&gt;

&lt;p&gt;When it comes to reading source code, don't start by surveying every directory. Start with a handful of load-bearing files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;QueryEngine.ts&lt;/code&gt;&lt;/strong&gt; is the session layer. Its job isn't that it "does everything directly" — it's that it holds everything one conversation needs to persist across turns: message history, permission denial records, file read caches, model configuration, tool sets, the MCP client, Agent definitions, and the AppState read/write entry point. Each call to &lt;code&gt;submitMessage()&lt;/code&gt; is just a new turn within the same conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;query.ts&lt;/code&gt;&lt;/strong&gt; is the loop layer. It maintains a per-iteration &lt;code&gt;State&lt;/code&gt;, carrying messages, &lt;code&gt;toolUseContext&lt;/code&gt;, &lt;code&gt;autoCompactTracking&lt;/code&gt;, &lt;code&gt;turnCount&lt;/code&gt;, &lt;code&gt;pendingToolUseSummary&lt;/code&gt;, and other state into the next round. As the model streams its response, &lt;code&gt;query.ts&lt;/code&gt; collects any &lt;code&gt;tool_use&lt;/code&gt; blocks in the assistant message. If there are no tool calls, it wraps up. If there are, it executes them, appends the results back to the message list, and continues the loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;Tool.ts&lt;/code&gt;&lt;/strong&gt; is the action protocol layer. A tool is not a function — it is a protocol: input schema, invocation mode, whether it's read-only, concurrency-safe, destructive, requires permission, its result size, how it renders in the UI, how errors get backfilled, and more — all declared up front. The model doesn't output "I'll just do whatever." It outputs "I want to initiate a request under this tool protocol."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;tools.ts&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;services/tools/toolExecution.ts&lt;/code&gt;&lt;/strong&gt; are the tool menu and execution lifecycle. The former determines which tools the model can see in the current turn; the latter governs how a single tool call goes through schema validation, tool-level input validation, permission checks, hooks, actual execution, and result serialization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;context.ts&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;constants/prompts.ts&lt;/code&gt;&lt;/strong&gt;, and &lt;strong&gt;&lt;code&gt;services/compact&lt;/code&gt;&lt;/strong&gt; are the model's workbench. They determine how system rules, project memory, Git state, tool descriptions, message history, tool result budgets, and compaction summaries are assembled into each model request.&lt;/p&gt;

&lt;p&gt;So source reading can be compressed into one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QueryEngine manages sessions, query.ts manages the loop, Tool defines action boundaries, Context/Prompt assemble the model's workbench.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this backbone is clear in your mind, MCP, Skills, Agents, and Plans won't feel like scattered feature islands. They're all extensions that plug into this main highway.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Three-Layer Architecture as a Whole
&lt;/h2&gt;

&lt;p&gt;Now let's bring everything together into a unified understanding.&lt;/p&gt;

&lt;p&gt;The functional architecture tells us that Claude Code is not a model—it is a &lt;em&gt;capability system&lt;/em&gt; built around a model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model API
-&amp;gt; QueryEngine
-&amp;gt; Tools / Context / Memory / State
-&amp;gt; MCP / LSP / Skills / Agent Collaboration
-&amp;gt; Security &amp;amp; Governance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runtime architecture tells us that a user's prompt does not go directly to the model—it enters a continuously running Agent Runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input
-&amp;gt; QueryEngine assembles context
-&amp;gt; Model API makes a decision
-&amp;gt; Tools execute
-&amp;gt; Results flow back
-&amp;gt; QueryEngine proceeds to the next round
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code architecture tells us how to find entry points when reading the source, guided by these layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Entry layer:       cli.tsx / main.tsx
Interaction layer: REPL / Terminal UI
Orchestration layer: QueryEngine / query
Capability layer:  Tools / Commands / Services
Context layer:     context / memory / compression
State layer:       AppStateStore
Security layer:    permissions / sandbox / audit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the essence of Claude Code is not "a model plus a handful of tools." It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;An extensible, governable, continuously running Agent Harness built around a model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Harness" is an apt metaphor here: the model supplies intelligence; the Harness supplies the operating environment. Without the model, the system has no reasoning ability; without the Harness, the model has no stable ability to get things done.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Main Thread in a Nutshell
&lt;/h2&gt;

&lt;p&gt;If all you want is the backbone of Claude Code's engineering architecture, hold on to these few lines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Claude Code is not a chat box — it is a CLI-based Agent Runtime.&lt;/li&gt;
&lt;li&gt;The Model API is the reasoning core, but it does not directly execute real-world actions.&lt;/li&gt;
&lt;li&gt;The QueryEngine is the main loop that strings together user input, model responses, tool calls, and state transitions.&lt;/li&gt;
&lt;li&gt;The Tools system is the execution layer, but every tool must pass through contracts, permissions, and result serialization.&lt;/li&gt;
&lt;li&gt;Context Engineering is the dynamic assembly of context — it is not simply writing a long prompt.&lt;/li&gt;
&lt;li&gt;The AppStateStore enables the CLI UI, session state, tool state, and agent state to work in concert.&lt;/li&gt;
&lt;li&gt;MCP, LSP, and Skills form the extension layer, so Claude Code does not have to hard-code every capability internally.&lt;/li&gt;
&lt;li&gt;The security layer determines whether an agent can graduate from demo to real-world engineering environments.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The next piece dives deeper: how exactly the &lt;code&gt;QueryEngine&lt;/code&gt; implements this dialogue main loop, and how it threads model calls, tool execution, and context compression together into a resumable state machine.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>agents</category>
      <category>sourceanalysis</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
