DEV Community: LienJack

Claude Code Source Analysis Series, Chapter 5: Tools Overview

LienJack — Sun, 10 May 2026 08:11:19 +0000

Chapter 6 of the Claude Code Source Analysis Series | Tools Overview

This article focuses on the layer that turns model intent into real engineering action.

Inside Claude Code, QueryEngine runs the multi-turn agent loop, the prompt runtime assembles what the model sees on each turn, and context management keeps long-running work from collapsing under its own history. The tool system is the next critical layer: once the model decides what it wants to do, how does Claude Code turn that intent into an action that is executable, constrained, recoverable, and auditable?

The model itself does not execute commands or modify files directly. It emits structured action intent, and the runtime tool system decides how that intent is interpreted, gated, executed, and written back into the session.

To keep the discussion concrete, we will use one running example:

User: Help me fix the failing tests in this project.

As we've already discussed, Claude Code can't stop at "guessing." For this task, it actually needs to do the following:

Search the project structure
Read relevant files
Edit code
Run the tests
Adjust based on errors
Ask for confirmation before high-risk operations

Behind these actions is the Tools system.

So the core question this chapter answers is not "what tools does Claude Code have," but rather:

How does Claude Code turn the model's intent to act into engineering actions that are executable, constrainable, recoverable, and auditable?

1. `Tool.ts` Solves the Problem of "Actions Must Become Protocols First"

Tool.ts is not a specific tool, but the contract that all tools must honor.

You can think of it as a "tool ID card." Every tool must declare:

Its name
What parameters it accepts
Whether it is read-only
Whether it can run concurrently
What permissions it needs
What context it receives during execution
How it hands results back to the system after execution

This step is critical. Only once an action has been protocolized can the system govern it.

A mature tool cannot be just a function. It must simultaneously answer "how to invoke it," "whether it can be invoked," "where it can be invoked," and "what happens after invocation."

This is the first core layer of the Claude Code tool system:

A Tool is not a feature button — it is the runtime contract that a model action must sign before entering the real world.

2. `inputSchema` Turns Model Output from "Natural Language" into "Structured Intent"

The most easily underestimated part of the tool protocol is inputSchema.

Its purpose isn't to make TypeScript look pretty. It's to constrain model output into parseable data.

Take file reading, for example. If the model just says:

I want to look at src/foo.ts

The host program still has to guess at its intent. But if the model emits a tool call:

{
  "tool": "Read",
  "input": {
    "file_path": "src/foo.ts"
  }
}

The system knows unambiguously:

Which tool to invoke
What the parameters are
Whether the parameters are valid
Whether this action is a read, write, search, or execution
Which permission and execution path to follow next

This is also the key difference between function calling, tool use, and a plain prompt: the model doesn't just "say what it wants to do" — it submits an executable request against the protocol.

So the value of inputSchema goes beyond just "defining parameters."

It turns a model's vague intent into an engineering object the system can act on.

3. `ToolUseContext` — Tools Are Not Isolated Functions

If you look at a single tool in isolation, it's easy to imagine it running like this:

input parameters -> execute function -> return result

(Many demo-level Agent frameworks are designed exactly this way. The cracks don't show until you hit production.)

But Claude Code's tools don't operate that way.

When a tool executes, it receives the full ToolUseContext, the runtime context object passed into tool execution. This context carries a wealth of information the current session needs to function, such as:

The currently active tool set
The MCP client and MCP resource
The current AppState
The message history
The file read cache
The abort controller
Notification capabilities
Task and file history updaters

What this means is that a tool is never an "island." Every action it performs can ripple through the entire session.

Returning to the "fix a failing test" example:

Grep finds files related to the failing test — that affects the model context for the next turn.
Read reads a file — the system tracks the read state.
Edit modifies a file — the UI needs to render the diff.
Bash runs a test and it fails — the error log flows back into the message stream.
If the user interrupts — any long-running command must be cancellable or wind itself down cleanly.

So the tool system is not a simple function-call layer.

It is part of the Claude Code runtime.

4. `tools.ts` Is a Tool Registry, Not the Final Menu

Once you understand what a single Tool looks like, the next step is tools.ts.

It's responsible for registering Claude Code's foundational capabilities into a tool pool. You'll see many categories of tools here:

File tools: Read, Edit, Write, Notebook
Search tools: Glob, Grep
Terminal tools: Bash, PowerShell
Web tools: WebFetch, WebSearch, WebBrowser
Collaboration tools: Agent, SendMessage, AskUserQuestion
Workflow tools: Todo, Task, Plan, Worktree
Extension tools: MCP, LSP, ToolSearch, Skill

But there's one point where people commonly trip up:

getAllBaseTools() produces a candidate pool, not the final tool menu that the model sees.

Many readers make a wrong assumption at this stage, thinking that however many tools are registered is how many the model can use directly. That's not how it works.

Claude Code first assembles a large candidate pool, then filters it down layer by layer based on environment, mode, rules, and runtime state. Only then does it produce the tools visible for the current turn.

This pipeline illustrates a foundational principle for mature agent systems:

More capabilities is not better. Capabilities must be dynamically trimmed by context, permissions, and cost.

5. Why Tools Are Filtered Before They Reach the Model

Here we have a critical security design decision.

Claude Code does not wait until the model invokes a tool to decide whether it can execute. It performs "tool visibility filtering" first.

If a tool is entirely blocked by a deny rule, the model simply never sees it in this round.

Think of it as two gates:

To put it bluntly: if the model can't see a tool, it won't plan tasks around it. This is far safer than "let it see, then reject."

The first gate answers:

Is the model even allowed to see this tool in this round?

The second gate answers:

Can this specific invocation actually execute?

These two concerns must not be conflated.

That is the point of "pushing security upstream."

(In permission design, we often face this temptation: "let the model see everything, then block at execution time." Claude Code makes the opposite choice — what shouldn't be seen simply isn't shown. The cost is that the tool list changes frequently, but security improves by an order of magnitude.)

6. `ToolPermissionContext` — The Permission Backpack

Both tool filtering and tool execution depend on ToolPermissionContext, the runtime bundle of rules and permission state used to decide visibility and execution behavior.

It's not a simple true / false toggle. It's an entire bundle of permission context, typically containing:

the current permission mode
user-level rules
project-level rules
local rules
policy rules
command-line rules
session-level rules
three behavior categories: allow / deny / ask
whether bypass is permitted
whether dialogs should be suppressed
additional working-directory boundaries

This explains why Claude Code's permission system feels "heavyweight."

Because it's not just answering "can this tool be used?" — it's answering something far more nuanced:

In the current project,
under the current permission mode,
accounting for user settings, project settings, policy settings, CLI arguments, and session-level ad-hoc rules —
should this tool even be visible to the model?
And if the model does invoke it, should that invocation be allowed, asked about, or denied?

The most critical rule of all:

Deny beats allow.

Even if a tool is permitted somewhere, it must be rejected as soon as a more specific rule explicitly denies it. A security system can't rely on "default trust"; explicit denials must carry higher priority.

(This mirrors firewall rule-matching logic: more specific rules take precedence, and once a deny rule is hit, the chain stops — no further rules are evaluated.)

7. Tool Execution Is Not Just "Calling a Function" — It's a Lifecycle

When the model actually issues a tool_use block, the structured tool-call record returned by the model, Claude Code still has to run it through an execution pipeline.

A typical tool lifecycle looks roughly like this:

None of the steps in this pipeline are decorative.

Parameter validation prevents the model from sending malformed structures.

Permission checks block dangerous operations.

Scheduling determines which tools can run in parallel and which must be serialized.

Result serialization ensures the model can understand what just happened in the next round.

Message write-back guarantees the entire session isn't a one-shot action — it's a cycle that can keep advancing.

Strip all of this away, and Claude Code degrades to:

model says something -> program takes a gamble -> command runs unchecked -> result gets stuffed back in

That clearly cannot support real-world engineering projects.

8. Why Tools Are Categorized as Read-Only, Destructive, and Concurrency-Safe

In a simple demo, the only question about a tool tends to be "can I call it or not?"

But in a real development environment like Claude Code, a tool needs to answer at least three questions.

First, is it read-only?

Read, Grep, and Glob are generally low-risk tools because they observe the project without directly modifying it. Edit, Write, and Bash, on the other hand, can change files or the environment and carry higher risk.

Second, is it a destructive operation?

Even within Bash, npm test and rm -rf are not remotely in the same league. The tool system must support finer-grained risk assessment.

Third, can it run concurrently?

Two read tools running in parallel is usually fine. But two write tools modifying the same area at the same time, or one Bash command that depends on the output of another, cannot be parallelized casually.

This is why the Tool protocol includes so much metadata that seems "extra" at first glance.

It's not about making the interface complex. It's about letting the system know: how should this action be treated?

9. Built-in Tools Fall into Five Categories — Not a Pile of Names

If you just list 40+ tools, readers get lost quickly.

A better way to understand them is by grouping them around what problem they solve.

Category	Representative Tools	Problem Solved
Files & Search	Read, Edit, Write, Glob, Grep	Let the agent understand and modify the project
Shell Execution	Bash, PowerShell	Let the agent verify, build, and test
Session Control	AskUserQuestion, Todo, Plan	Let the agent plan, clarify, and maintain task state
Collaboration Tasks	Agent, Task, SendMessage	Let complex work be split, tracked, and results collected
External Extension	MCP, LSP, WebFetch, WebSearch, Skill	Extend capability boundaries to external services and reusable workflows

These categories capture a single insight:

Claude Code isn't just "able to operate on files" — it's decomposing the real software development process into a set of governable action interfaces.

When fixing tests, the agent might walk a path like this:

This isn't "one tool call" — it's a closed loop where a chain of tool invocations and model reasoning push each other forward.

10. Why MCP, LSP, and Skill Can All Plug Into the Same System

A unified Tool protocol has another major benefit: new capabilities can be plugged in without upending the entire architecture.

Whether it's an MCP tool, an LSP tool, or a Skill tool, they all ultimately need to be translated into a tool view that Claude Code understands:

A name
An input schema
A description
Enabling conditions
Permission semantics
An execution result

That's the technical debt the unified protocol eliminates.

Without it, every external system you connect would require inventing a new set of rules. The more you hook up, the messier the system becomes.

With a unified protocol, adding a new capability boils down to answering five questions:

How do you describe yourself?
How do you receive input?
How do you execute?
How do you declare risk?
How do you hand results back to the main loop?

11. The Tool System Is Where Claude Code's Engineering Philosophy Really Shows

After reading through the Tools system, the most important takeaway isn't memorizing any particular tool name — it's understanding Claude Code's engineering orientation.

The model is not the executor. The runtime is the executor.

The model decides whether the next step requires action and what the intent behind that action is. The tool system inside the host process is what actually carries out the action.

Tools aren't plugins — they're runtime protocols.

Every tool must pass through the full pipeline: schema registration, context, permissions, dispatch, result backfill, and UI presentation.

Security isn't a final pop-up. It's two-phase governance: tool exposure and tool execution.

What the model can see is itself part of the security boundary. What the model actually gets to invoke forms the second boundary.

Extensibility isn't about maximum surface area — it's about being trimmable, filterable, and auditable.

Claude Code supports MCP, LSP, Skills, and multi-agent workflows not by dumping every capability onto the model, but because every one of those capabilities has to pass through the same tool pipeline.

12. The Whole Chapter in One Diagram

Finally, here is Claude Code's tool system compressed into a single complete diagram:

Use this diagram as a map when reading Tool.ts, tools.ts, toolExecution.ts, and the permission-related code.

13. Which Tool Chain to Follow When Reading the Source

If you really want to understand the tool system by reading the source, don't start from a specific tool. Instead, trace a complete call chain first:

Tool.ts
-> tools.ts
-> query.ts
-> toolExecution.ts
-> permissions.ts
-> tool_result → backfilled into messages

Step one: read Tool.ts. The focus isn't the tool names, but the Tool protocol itself: inputSchema, call, validateInput, checkPermissions, isReadOnly, isConcurrencySafe, isDestructive, interruptBehavior, maxResultSizeChars. Together, these fields answer one question: what governance information does the system need to know about a model-initiated action before it enters a real engineering environment.

Step two: read tools.ts. getAllBaseTools() is only a candidate pool, not the model's final menu. Before being exposed to the model, tools pass through mode filtering, permission deny-rule filtering, MCP tool merging, sorting, deduplication, and cache stability handling. A key point here: tool visibility is itself part of permissions. A blanket-denied tool should ideally disappear before the model ever sees it—not after the model calls it and gets rejected.

Step three: go back to query.ts. The tool_use blocks returned by the model are collected and handed to runTools() or StreamingToolExecutor. This is where you see the interface between the tool system and the ReAct main loop: a tool is not a UI button, but a fork point in the next round of the state machine.

Step four: read the single-invocation lifecycle in toolExecution.ts:

Find the tool definition
-> inputSchema validation
-> tool-level validateInput
-> PreToolUse hooks
-> permission check
-> tool.call()
-> result serialization
-> PostToolUse hooks
-> produce tool_result

This lifecycle is what separates a production-grade agent from a simple function map. Errors don't blow up the main loop; instead, they are converted into tool results the model can understand in the next round whenever possible.

Step five: pick one concrete tool to read, such as FileReadTool. It's not just fs.readFile()—it also handles path validation, large-file budgeting, offset/limit, PDF/image processing, duplicate-read deduplication, permission checks, Skill triggering, and UI display. After reading it, you'll better understand why Claude Code builds tools as a "semantic protocol" rather than stuffing every action into Bash.

Once you've traced this chain, the essence of Tools becomes clear:

The model only proposes structured intent.
The tool protocol describes the boundaries of an action.
The executor governs the lifecycle.
The permission system decides whether the action lands.
tool_result brings the real world back to the model.

14. Summary

Claude Code's tool system can be summed up in a single sentence:

Tools are the runtime protocol layer in Claude Code that turns model intent into real engineering actions; they give the model hands and feet, while also fitting those hands and feet with boundaries, permissions, and feedback loops.

Once you grasp Tools, you stop seeing Claude Code as "a chat model plus a few plugins." It's better understood as an Agent Harness: the model handles thinking, tools handle acting, permissions handle boundaries, and state ties each action into a sustainable, forward-moving engineering loop.

Context Governance for Coding Agents

LienJack — Sun, 10 May 2026 08:05:42 +0000

Context Governance for Coding Agents

When people first hear the phrase "context management," they often reduce it to two ideas:

Use a larger context window.
Compress history when the window is about to overflow.

That is not wrong, but it is far too narrow.

In ordinary chat systems, context management really is mostly about conversation history. But once a system becomes a coding agent, especially one that reads files, calls tools, runs commands, writes code, and interacts with external systems, context is no longer just a transcript. It becomes the whole working scene the model can see on every turn.

So the real question is this:

During real engineering work, an agent keeps producing new information. How does the system decide what should enter the model, what should stay outside it, what should be compressed, and what must survive over time?

This article uses Claude Code as one concrete case study, but it is not only about Claude Code.

Claude Code is a strong case study because it exposes the context problem in a very direct way: source files are long, tool outputs are long, test logs are long, and tasks regularly stretch across dozens of turns. But the same class of problem appears in many other agent systems too, including LangGraph, the OpenAI Agents SDK, AutoGen, Cursor, Devin, OpenClaw, and Hermes.

The difference is where each project places the weight:

Claude Code is closer to a long-running CLI agent. Its pressure comes from tool output, project rules, compression, and recovery.
LangGraph is closer to a workflow state machine. Its pressure comes from structured state, checkpoints, and resumable execution.
OpenAI Agents SDK is closer to an application SDK. Its pressure comes from separating local runtime context from model-visible context.
AutoGen is closer to a multi-agent conversation framework. Its pressure comes from role separation, memory injection, and collaborative context flow.
Cursor and Copilot are closer to in-IDE real-time assistants. Their pressure comes from low latency, local code snippets, and retrieval precision.
Hermes, OpenClaw, and enterprise harnesses lean more toward long-running runtime governance, entry-point control, and policy enforcement.

So context management is not one feature inside one product. It is a foundational engineering problem that almost every serious agent system eventually hits.

In this article, I use context management for the operational mechanics and context governance for the broader design problem around visibility, authority, recall, compression, and isolation.

The point here is to widen the lens and look at the broader governance model underneath the implementation details:

The model is stateless.
The task is continuous.
Information explodes.
The context window is finite.
The outer system has to rebuild the working scene every turn.

To keep the discussion concrete, we will use one running example:

The user says: post-login redirect is broken in this project. Find the cause and fix it.

A real agent would not stop at "maybe check the route guard." It would do something more like this:

Inspect the project structure
-> Search for login-related code
-> Read the route guard
-> Read auth state management
-> Run tests
-> Analyze error logs
-> Modify code
-> Run tests again
-> Summarize the change and the remaining risks

Each step produces more information. Context governance exists to keep that information alive across a long task without drowning the model in it.

1. Why Context Management Becomes an Engineering Problem

Start with the most basic fact:

Every model call is stateless by default.

The model does not naturally remember which file it read on the previous turn, nor does it automatically know where the last test failed. An agent only appears continuous because the runtime outside the model reconstructs the current working scene on each round and sends it back in.

A simple chat turn looks roughly like this:

user question
-> model answer

An agent turn looks much more like this:

system rules
+ project rules
+ current user goal
+ message history
+ tool descriptions
+ recent tool results
+ current task state
+ compressed summaries
+ available external resources
-> model decides what to do next

At that point, context management is no longer answering "how do I save the chat history?" It is answering questions like these:

What exactly should the model see on this turn?
Which information should be visible on every turn?
Which information should only be fetched on demand?
Which tool results are already stale?
Which content is too large and must be trimmed?
Which parts of history can be summarized?
How do you preserve continuity after compression?
How do you isolate context across multiple agents?
Which internal states must never be exposed to the model?

That is a systems-design problem, not a prompt-wording problem.

Without context governance, an agent quickly runs into several classic failures.

1. Token Explosion

Tool output keeps piling up and requests get longer and longer.

One grep can return dozens of matches. One test run can spill thousands of log lines. One source file can cost thousands of tokens. In long tasks, what fills the window is often not the user's words but the environmental noise coming back from tools.

Many teams get trapped here because they only count conversation turns and think, "We've only had 20 turns, so the window should still be fine." But each tool call has been dumping more material into the context the whole time.

2. Context Pollution

Old information is still present in context even though the real world has already changed.

For example, the agent first reads auth.ts, later edits it, but the old version still sits in history. On the next turn the model may reason carefully from information that is no longer true.

It looks like deliberate analysis,
but the thing being analyzed is no longer the current code.

3. Constraint Loss

The user says early on, "Don't change the public API," and by turn ten the model has forgotten.

The project rules say migration files must not be hand-edited, but after compression that rule may not survive into the summary. The task keeps moving, every step still sounds reasonable, and the system has already crossed a boundary.

4. Compression Amnesia

Compression is not free.

A weak summary may record which files were read and which code was modified, while losing:

the user's actual goal
where the task is currently stuck
which approaches have already failed
which constraints must not be violated
what should happen next

That leaves the model like someone who has read the meeting minutes but never sat in the room.

5. Multi-Agent Pollution

The problem becomes even sharper once sub-agents enter the picture.

A research agent may read a huge amount of material, while the execution agent only needs the final conclusion. If you dump all of the research agent's drafts and dead ends into the executor's context, the downstream agent does not become smarter. It becomes noisier.

In multi-agent systems, the danger is often not a lack of information. It is every agent carrying someone else's intermediate state forward.

2. Context Is Not Prompt, and It Is Not Memory Either

Before going deeper, it helps to separate a few terms that are easy to blur together.

Concept	Plain meaning	The question it answers
Prompt	Task wording	How should I ask the model?
Context	Current workbench	What does the model actually see on this turn?
Memory	Reusable knowledge	Which facts should survive across tasks?
Transcript	Raw archive	How do we audit and recover the full process?
State	Structured task state	What is the current machine state of the task?
Artifact	External output	Where do files, logs, diffs, and reports live?

A practical analogy looks like this:

Prompt is the assignment sheet.
Context is the material spread across your desk.
Memory is the filing cabinet.
Transcript is the audio/video recording.
State is the project kanban board.
Artifacts are the actual documents and code produced.

Many agents become unstable precisely because these layers get mixed together.

Treat the transcript as context, and every turn explodes in tokens.

Treat context as memory, and transient noise pollutes long-term recall.

Treat memory as prompt, and the model misreads "experience" as hard policy.

Store state only in natural-language history, and long tasks lose it the moment compaction happens.

So the first principle of context management is simple:

Do not shove every kind of information into one linear chat history.

A more reliable engineering pattern is to keep different information in different layers, then assemble only the small subset needed for this turn right before each model call.

3. Separate the Action Layer from the Architecture Layer

Many context-management discussions start by listing a set of actions:

Offload: move large objects out of the prompt
Reduce: trim, extract, summarize
Retrieve: bring information back when needed
Isolate: split work into independent contexts
Cache: reuse stable context or computed results

These actions are useful, but they answer only one question:

The context is too large. What operations can I apply?

Real engineering has to answer earlier questions first:

Should the model even see this piece of information?
Which source outranks which?
Is this hot or cold information right now?
Should it appear as raw text, a summary, a citation, or structured state?
Where should it be recalled from?
How do I compress it without distorting the truth?
Inside which boundary should it apply?

That is where the broader seven-dimensional model becomes useful. It upgrades context management from "a list of cleanup operations" into an architectural model.

The action layer is like a toolbox. The architectural model is like a blueprint.

A toolbox tells you that you have a hammer, pliers, and a screwdriver. A blueprint tells you where you are allowed to hammer, which layer gets installed first, and how to trace accountability when the system goes wrong.

4. The Seven-Dimension Model: Turn Context into a Governable Working Set

If context management is treated as a real subsystem, it has to manage information across at least seven dimensions:

Visibility:  what the model is allowed to see
Authority:   which source wins when conflicts arise
Temperature: whether the information is hot, warm, cold, frozen, or long-term
Shape:       what form the information takes
Retrieval:   where to recall information from when it is missing
Compression: how to shrink context without losing the truth
Boundary:    how to isolate across tasks, agents, tenants, and permissions

These are not parallel buzzwords. Together they form a practical engineering pipeline.

1. Visibility: Decide First Whether the Model Should See It

The first gate is not compression. It is visibility.

Context usually falls into three broad categories:

Type	Examples	Handling
`llm_visible`	user goals, project rules, key code snippets, filtered retrieval results	may enter model-visible context
`runtime_only`	API keys, permission objects, sessions, traces, internal dependencies, database handles	available to tools and runtime only
`artifact_ref`	large logs, large files, page snapshots, full diffs	keep the original outside the prompt and provide a reference plus preview

One early mistake many agent systems make is confusing "the tool can access it" with "the model should also see it."

This is why the OpenAI Agents SDK's distinction between local context and LLM context matters so much. Tool functions may need the current user object, the logger, the dependency container, and permission state. The model usually does not.

One-line rule:

If the model does not need to see it, do not show it to the model. If a reference is enough, do not paste the full original.

2. Authority: Conflicts Need a Resolution Chain

Conflicts show up in context all the time.

The user says, "Edit the generated file directly," while the project rules say, "Do not modify generated files." Long-term memory says the user likes Redis, while the current task says not to introduce Redis. An old summary says tests passed, while the newest tool output says they failed.

If the system has no explicit resolution chain, it is effectively dumping the conflict onto the model and hoping it will improvise correctly.

A sensible default ordering might look like this:

System / safety policy
> Tenant / organization policy
> Project rules
> Current user instruction
> Current task state
> Verified retrieval result
> Long-term memory
> Historical summary
> Raw old conversation

The point is not that every system must use this exact order. The point is:

Authority has to be designed. It cannot be replaced by adding more "please follow these rules" text to the prompt.

Claude Code's system rules, project rules, permission modes, and tool-safety checks are all forms of authority enforced at different layers. In enterprise agents, RBAC, approvals, and audit systems move authority even further out of the prompt and into the runtime.

3. Temperature: Information Needs Hot and Cold Layers

Context is not just "short-term" versus "long-term."

A more useful breakdown is:

Layer	Meaning	Examples
Hot	Must be used now; included by default	current user goal, latest failure log, file currently being edited
Warm	Probably relevant; often kept as summary or state	ruled-out causes, file summaries, active hypotheses
Cold	Recalled on demand	code index, documentation index, historical sessions
Frozen	Complete raw record; used for audit and recovery	transcripts, full logs, page snapshots
Long-term Memory	Stable facts across sessions	persistent user preferences, project conventions, long-lived rules

This makes a context manager look more like a memory manager:

Hot items cool into Warm after use.
Stable Warm items may move into long-term memory.
Cold items heat up again when retrieved.
Frozen records stay outside the prompt but preserve the truth.

This is also where weak summaries often break down. Many systems compress a hot live scene into a warm summary without preserving the recent tail, so the model loses its feel for the present on the very next turn.

4. Shape: The Same Information Can Take Different Forms

Not everything should be represented as natural-language prose.

The same test failure can exist in many shapes:

Shape	Best used when
Raw	you need line-by-line inspection
Extract	you only need command, exit code, error type, and key stack frame
Summary	you are reviewing older history
Structured State	you are tracking task status, failed attempts, and next steps
Reference	the original is too large, so you keep only an artifact ID or path
Diff	the code change matters more than the full file
Graph	the task is really about relationships, dependencies, or DAG structure

For example, a failing test log does not always need to enter the model as a raw blob. It can first be reshaped like this:

command: pnpm test auth
status: failed
error: TypeError user.id should be string
file: src/auth/session.ts
test: redirects after login
artifact: logs/test-auth-2026-05-03.txt
next_step: inspect mock user construction

That is the value of shape:

The same information, represented differently, changes token cost, retrievability, and reliability.

LangGraph's State, Claude Code's compact summary, the OpenAI Agents SDK's tool context, and execution context in enterprise systems can all be understood through this lens.

5. Retrieval: Recall Is Not Just Vector Search

Many people hear "retrieval" and immediately think of vector databases. But agents need far more than one retrieval path.

A mature system typically has multiple recall routes:

Retrieval path	Best suited for
Recent Tail	recent conversation, current tool results, current state
Rule Loading	`AGENTS.md`, `CLAUDE.md`, project rules
Keyword Search	function names, error codes, field names, config keys
Vector / Hybrid Search	document semantics, similar experiences, complex knowledge
Tool Search	progressive loading of tools, skills, and plugins
Artifact Lookup	large logs, large files, web outputs
Memory Search	user preferences, long-term facts, project conventions
Graph Traversal	module dependencies, task DAGs, database relationships

In code-heavy tasks, symbols, keywords, and paths are often more important than pure semantic retrieval.

In enterprise knowledge systems, hybrid search, permission filtering, and source credibility matter more.

In multi-agent systems, artifact lookup and structured handoff matter more.

So the key retrieval question is not "do you have RAG?" It is this:

When this kind of task is missing information, what is the most reliable way to recover it?

6. Compression: Shrink the Working Set, Not the Truth

Compression is not just LLM summarization either.

It can be broken into several categories:

Compression method	Meaning	Main risk
Truncate	cut directly	easiest way to lose critical constraints
Extract	pull out key fields	incomplete extraction rules can leak important information
Summarize	model-generated summary	prone to summary drift
Distill	condense into structured state	requires schema design
Archive + Ref	keep the original externally and retain only a reference	later recovery must be possible
Rehydrate	expand back to the original when needed	requires traceable provenance

A more reliable order often looks like this:

First offload large objects
-> extract key fields
-> distill them into structured state
-> summarize old history
-> truncate only when absolutely necessary

The biggest danger is summary drift: the summary quietly rewrites user constraints, failure causes, or unresolved issues.

So a compression result should preserve:

source scope
critical constraints
failed attempts
unresolved issues
artifact references
next steps

That is why Claude Code-style compaction works best when the summary behaves like a handoff document, not a book report.

7. Boundary: Isolation Is the Main Thread's Self-Preservation Mechanism

Boundary is the most underrated dimension.

The value of sub-agents is not just parallelism. It is context isolation.

These kinds of tasks especially benefit from isolation:

large-scale search and research
long log analysis and debugging
codebase scanning and web scraping
data cleaning and independent implementation work
high-privilege tool calls
multi-tenant data access

Boundaries can exist at many layers:

Boundary	Purpose
Thread	isolate context across sessions
Task	isolate state across separate tasks
Subagent	isolate local context for a sub-task
Tool	isolate permissions plus input/output flow
Artifact	externalize large objects so they do not pollute messages
Permission	require approval for high-risk actions
Tenant	isolate across organizations, users, and data domains
Sandbox	isolate execution environments

A well-scoped sub-agent should look like this:

Narrow input: task + constraints + artifact references
Narrow output: conclusion + evidence + suggested next step + confidence

The main thread should never receive a full replay of a sub-thread's raw process.

Without boundary discipline, multi-agent systems easily degrade from "collaboration" into "mutual contamination."

5. How Context Grows While an Agent Executes a Task

Go back to the login redirect example.

At the start, the user only provides one sentence:

Post-login redirect is broken in this project. Help me find the cause and fix it.

If the agent genuinely tries to solve that problem, it will generate context like this:

Step	New information produced
Inspect directory	project structure, framework type, entry files
Read `package.json`	test commands, dependencies, scripts
Search for `login`	matching files, relevant functions, route paths
Read route guard	auth logic, redirect handling
Read state management	token, user, session storage strategy
Run tests	failure logs, stack traces, test names
Modify code	diff, changed files, implementation hypotheses
Run tests again	new results, new errors, or proof of success

Some of that information is hot. Some cools off quickly.

The current failure log is hot because the next step depends on it.

Old search results are warm because they might still be useful, but they do not need to remain verbatim forever.

The first file read can become cold, or even toxic, once that file has been edited.

The full transcript still matters, but as a cold archive, not as something to paste into every turn.

So the context manager's job is not "save everything." It is to keep asking:

At this exact step, which pieces of information matter most for the model to see?

That is the core of context governance.

6. Engineering Problems You Will Actually Hit, and How to Solve Them

Now let's break the engineering side down as symptom -> root cause -> response.

Problem 1: The Model Doesn't Know the Workspace

Symptom: the agent starts guessing.

It changes routing logic without reading the routing code. It claims the token was probably never stored without checking the tests. It reorganizes directories according to its own habits without seeing the project rules.

The problem is not that the model cannot reason. The problem is that the current context does not contain enough on-the-ground information.

The response is dynamic context injection:

load project rules and workspace information early
read task-relevant files on demand
treat search results as candidates first instead of dumping everything in
write tool results back into messages or structured state
retrieve external knowledge through search, web, MCP, or database tools only when needed

The key phrase is "on demand."

More context is not automatically better. A stable agent is not the one that has seen the most material. It is the one that sees the most relevant material on each turn.

Problem 2: Tool Results Are Too Large

Symptom: token usage rises fast, the model gets slower, and eventually the context limit hits.

The root cause is usually not too much user conversation. It is bloated tool output.

The response is to govern tool results before governing chat history:

set result budgets for each tool category
keep only summaries, key stack frames, exit codes, and affected files from long logs
return snippets, line ranges, symbol indexes, or references instead of entire large files
return search overviews first, and let the model drill into specific files later
snip or micro-compact stale tool output

Claude Code is especially instructive here. In coding agents, context windows often explode because Bash, Read, and Grep bring back too much real-world material, not because the model reasoned too much.

Problem 3: Stale Information Pollutes New Decisions

Symptom: the agent keeps reasoning from old code or re-investigates paths that have already been ruled out.

The root cause is a missing notion of freshness.

The response is to give context a lifecycle:

file reads should carry version, hash, mtime, or read timestamp
once a file changes, old reads should be down-weighted or marked stale
test logs should be associated with the command, commit, and time they came from
search results should be treated as clues, not truth
key facts should cite sources whenever possible instead of living only in prose summaries

That is why a context manager should ideally handle information as metadata-rich objects, not just a bag of strings.

Problem 4: Rules Conflict with Each Other

Symptom: system rules, project rules, the current user instruction, and long-term memory collide.

For example:

The system says secrets must never leak.
Project rules say generated files must not be edited.
The current user request says to edit a generated file directly.
Long-term memory says the user prefers speed over ceremony.

If all of that is merely dumped into natural-language context, the model may resolve the conflict inconsistently or for the wrong reason.

The response is an explicit authority hierarchy:

System / security policy
-> organization-level rules
-> project-level rules
-> current user instruction
-> long-term preferences
-> retrieval and tool results

In practice, rules often split into:

hard constraints: the system must intercept or require approval
soft constraints: inject into context for the model's guidance
situational constraints: inject only when a path, tool, or task matches

This is also why an extremely long AGENTS.md or CLAUDE.md is not automatically better. Rules that are too long, too broad, and too conflicting eventually become context noise.

Problem 5: The Task Loses the Thread After Compression

Symptom: after compaction, the model knows roughly what happened but not what it should do next.

The root cause is that the summary records history but not state.

A good compressed summary is not an essay abstract. It is a task handoff.

At minimum, it should preserve:

the user's goal
inviolable constraints
the current phase
important files already read
files already modified
key judgments and evidence
failed attempts
latest test or verification results
recommended next step

And ideally it should also preserve the most recent few raw turns plus key tool outputs.

In other words:

The summary preserves the main thread.
The recent tail preserves the live feel of the scene.

That is much more stable than flattening all old history into one paragraph.

Problem 6: Multiple Agents Contaminate Each Other

Symptom: one sub-agent's draft, assumptions, or dead ends distort another sub-agent's work.

The root cause is a shared linear chat history.

The response is context isolation:

a sub-agent receives a local task, not the full global history
a sub-agent returns a structured result, not a complete thought dump
upstream passes forward verifiable artifacts, references, and conclusions
shared state is managed with schemas, not casual paraphrase
each agent gets its own tool permissions and context budget

In complex work, isolation often matters more than collaboration.

Without isolation, multi-agent work quickly turns into multiple agents polluting the same working surface.

Problem 7: Cost and Latency Spiral Out of Control

Symptom: the agent can work, but every step becomes slow, expensive, and verbose.

The root cause is that each turn carries too much fixed content, or keeps re-searching, re-reading, and re-explaining from scratch.

Useful responses include:

prompt caching for stable system prompts and tool descriptions
lazy loading for detailed tool docs, rule files, and long documents
progressive disclosure: summary or index first, full content only when needed
local context for runtime dependencies and internal state that the model does not need
structured state for machine-processable information that should not become natural-language tokens

The key insight is this:

Large context windows solve the capacity problem. They do not solve the information-discipline problem.

No matter how large the window gets, if you stuff every turn with irrelevant material, the model will still be slow, expensive, and prone to drift.

7. How Different Projects Handle Context

Now put several representative systems on the same canvas.

The point is not to decide which one is "more advanced." The point is to see how radically the pressure on context management changes across different host environments.

1. Claude Code: Context Defense Lines for a Long-Task CLI Agent

Claude Code's typical environment is:

Inside a real repository, continuously reading files, editing code, running commands, and fixing bugs.

Its most visible context pressures are tool results and long-task history.

So its context priorities are:

inject project context through CLAUDE.md, rule files, and path scoping
keep large files, logs, and search results from flooding the message stream
compact history into summaries near context limits so the task can continue
preserve transcript, resume state, and recent tail for continuity
isolate search, analysis, and implementation into sub-agents when needed

The big lesson from Claude Code is:

For a coding agent, context management is not primarily about long-term memory. It is about keeping the tool loop alive.

2. LangGraph: Move Context Out of Chat History and into Structured State

LangGraph looks at the problem from a different angle.

It does not primarily treat an agent as a running conversation. It treats it as a graph:

node executes
-> state updates
-> checkpoint
-> next node continues

Its context priorities are:

state schema
checkpoints
thread-level state history
time travel for debugging and branching
fault tolerance and recovery from the last valid checkpoint

The lesson here is:

Do not force chat history to carry all of the task state.

If a task has explicit steps, nodes, and intermediate state, a state graph can be much more reliable than keeping everything in natural-language dialogue.

Claude Code starts by governing messages and tool results. LangGraph starts by governing state and execution boundaries.

3. OpenAI Agents SDK: Separate Local Context from LLM Context

One of the most important distinctions in the OpenAI Agents SDK is this:

Local context: context visible to your code and tools at runtime.
LLM context: context visible to the model during generation.

That is an extremely engineering-oriented distinction.

Many developers think of "context" as simply "whatever gets sent to the model." But in real applications, some information is necessary for tools while remaining unnecessary, or inappropriate, for the model itself.

Examples include:

database connections
loggers
current user objects
permission state
internal dependencies
tool-call metadata
usage statistics

These belong in runtime-local structures, not necessarily in model-visible context.

The lesson is:

The first step of context management is distinguishing what the runtime needs from what the model needs.

That separation prevents both accidental leakage and wasted tokens.

4. AutoGen: Model Context and Memory Injection in Multi-Agent Systems

AutoGen's typical environment is multi-agent conversation and collaboration.

Its pressure is not just whether one model forgets. It is how multiple agents share information, separate roles, and control message history.

Its main context concerns include:

which messages each agent sees when calling the model
how memory gets queried and injected
how roles partition visible information
how team orchestration controls message flow and termination
when to keep full history versus only a window or head-and-tail view

The lesson from AutoGen is:

In multi-agent systems, context management is first and foremost boundary management.

A reviewer should not inherit every tool permission from the executor.

A researcher should not dump every search draft into the writer's context.

A planner's intermediate assumptions should not automatically become global facts.

5. Cursor / Copilot: IDE Assistants Prioritize Local Relevance and Low Latency

IDE assistants live in a very different environment.

They often need to autocomplete, explain, or rewrite code while the user is typing. The core pressure is not long-task recovery. It is:

Find the most useful code context near the cursor as quickly as possible.

So their context priorities skew toward:

snippets around the cursor
symbols in the current file
imports and type information
similar code blocks
recently edited files
semantic or incremental indexing

They do not always need full-project comprehension.

The lesson is:

Context management should serve the scenario, not chase completeness by default.

6. Hermes / OpenClaw / Enterprise Harnesses: Long-Running Runtime and Governance Context

One level up, context management expands from task execution into runtime governance.

OpenClaw is closer to an agent control plane and entry point. It cares about how messaging channels, automation tasks, device nodes, browsers, and local capabilities connect into one session system.

Hermes is closer to a self-improving runtime. It cares about long-term memory, user profiles, skill accumulation, cross-session recall, and reusable experience.

Enterprise harnesses care about pipeline context, secrets, connectors, RBAC, approvals, and audit, where the agent has to operate inside existing process controls rather than outside them.

What these systems share is:

Context is no longer just model input. It becomes part of the whole runtime environment.

At this level, context governance also has to answer:

who triggered the task
which channel it came from
whose machine or sandbox is executing it
which secrets are available
which approvals have already passed
which past experience is reusable
which operations must be auditable

That is why the end state of context management is not simply "better prompting." It becomes part of the agent harness itself.

8. Put Them Side by Side

We can place these systems onto a shared comparison grid:

Project / System	Primary scenario	Core of context management	Main problem solved	Easily overlooked edge
Claude Code	CLI coding agent	project rules, tool results, compression, recovery	keep long tasks coherent without tool output blowing up the window	compressed summaries can still lose local in-situ detail
LangGraph	graph-based workflow agent	state, checkpoints, threads, time travel	recoverable state and debuggable workflow nodes	model input still needs separate governance
OpenAI Agents SDK	application-style agent SDK	separation of local context and LLM context	layered handling of runtime dependencies and model-visible information	developers still have to design injection policy
AutoGen	multi-agent collaboration	model context, memory, role boundaries	multi-role message flow and memory augmentation	too much shared history causes contamination
Cursor / Copilot	IDE real-time assistant	cursor-local context, similar code, indexing	low-latency local relevance	not ideal for carrying long-task state by default
Hermes / OpenClaw	personal long-running runtime	gateway, memory, skills, session search	multi-entry operation and long-term experience reuse	long-term memory must resist staleness and contamination
Enterprise harnesses	workflow and governance agent	pipeline context, secrets, RBAC, audit	place agents inside governable enterprise processes	process boundaries constrain flexibility

The main point of the table is this:

These projects are not just giving different answers to the same exam question. They are handling different context pressures in different environments.

Claude Code struggles most with long tasks and tool output.

LangGraph struggles most with recoverable state.

OpenAI Agents SDK struggles most with the boundary between runtime state and model-visible state.

AutoGen struggles most with multi-agent coordination.

Cursor and Copilot struggle most with low-latency code relevance.

Hermes and OpenClaw struggle most with long-lived runtime continuity.

Enterprise harnesses struggle most with permissions, audit, and process embedding.

9. Building a Minimal Context Manager Yourself

If you are implementing a small agent from scratch, do not start with a giant vector database or a complex multi-layer memory design.

A more stable path is to split the context manager into explicit components:

Component	Responsibility
Visibility Filter	decide what may enter model context and what must remain runtime-only
Authority Resolver	resolve conflicts and priority
Temperature Manager	manage hot / warm / cold / frozen / long-term layers
Retrieval Router	choose whether to recall from rules, keywords, vectors, tools, artifacts, memory, or graphs
Compression Engine	handle offloading, extraction, summarization, structuring, and rehydration
Boundary Controller	manage thread, task, subagent, tenant, permission, and sandbox boundaries
Context Budgeter	manage token budget, selection reasoning, and the resulting context plan

Once split that way, a context manager is no longer "the code that assembles a prompt." It becomes a debuggable working-set planner.

An MVP loop can be quite simple:

1. Preserve all messages and tool results in the transcript.
2. Before each model request, collect candidate context from transcript, state, memory, and tools.
3. Tag each candidate with source, kind, temperature, authority, token estimate, and visibility.
4. Select the most relevant subset for the current task.
5. Trim or summarize large tool outputs.
6. Keep the most recent N raw turns.
7. Compress older history into a task handoff summary.
8. Force the summary to retain: goal, constraints, completed work, failed work, and next step.
9. Keep the pre-compression original in the transcript for recovery and audit.

A minimal data structure might look like this:

type ContextItem = {
  id: string
  kind: "instruction" | "user_goal" | "tool_result" | "file" | "summary" | "memory" | "state"
  source: string
  visibility: "llm_visible" | "runtime_only" | "artifact_ref"
  authority: "system" | "org" | "project" | "user" | "task_state" | "retrieval" | "memory" | "summary"
  temperature: "hot" | "warm" | "cold" | "frozen" | "long_term"
  shape: "raw" | "extract" | "summary" | "reference" | "diff" | "structured" | "graph"
  boundary: "thread" | "task" | "subagent" | "tool" | "tenant" | "sandbox"
  tokenEstimate: number
  freshnessTs?: string
  conflictKey?: string
  confidence?: number
  ttl?: string
  content?: string
  ref?: string
}

And each turn can produce a ContextPlan:

type ContextPlan = {
  selected: ContextItem[]
  compressed: Array<{ from: string; to: string; method: "extract" | "summarize" | "distill" | "archive_ref" }>
  dropped: Array<{ id: string; reason: string }>
  conflicts: Array<{ key: string; winner: string; losers: string[]; reason: string }>
  budget: {
    total: number
    used: number
    buckets: Record<string, number>
  }
}

The value of ContextPlan is explainability.

When the agent makes a mistake, you can ask:

What context was actually selected this turn?
Which rules were dropped?
Which tool result was compressed?
Why was long-term memory injected?
Why did a user constraint fail to make it into the prompt?

Without a plan like that, context behavior stays a black box.

One useful mental model for the per-turn build process is:

collect candidates
-> remove runtime-only items
-> resolve authority conflicts
-> drop stale tool results
-> prefer hot context
-> compress large items
-> preserve recent tail
-> inject final context

In pseudocode:

function buildContext(task, state, transcript, memory, budget) {
  const candidates = collect(task, state, transcript, memory)
  const visible = applyVisibilityFilter(candidates)
  const resolved = resolveAuthorityConflicts(visible)
  const fresh = updateTemperatureAndFreshness(resolved, state)
  const retrieved = routeRetrievalIfNeeded(fresh, task)
  const shaped = transformShape(retrieved)
  const compressed = compressToBudget(shaped, budget)
  const selected = enforceBoundaries(compressed, task)

  return [
    stableInstructions(selected),
    projectRules(selected),
    taskSummary(selected),
    recentTail(transcript),
    toolResults(selected),
    currentUserInput(task),
  ]
}

The key point is not the exact code. It is the mindset:

Context should be built deliberately. It should not just grow by accident.

You also should not budget only by total token count. Bucketed budgeting is usually more stable:

Budget bucket	Suggested share
System / policy / project rules	10%-20%
Current user input + task state	10%-20%
Recent tail	15%-25%
Retrieved context	20%-35%
Tool results / artifact preview	10%-20%
Long-term memory	5%-10%

When you exceed budget, do not immediately chop the recent tail first.

A safer order is often:

drop low-confidence retrieval first
-> drop expired memory
-> convert oversized tool results into artifact references
-> compress older history
-> shrink the recent tail only at the end

The recent tail often carries the system's sense of "where we are right now." Cut it too early and the model loses proximity to the live scene.

10. How to Write a Good Compression Summary

Many systems have unstable compaction because they aim the summary at the wrong target.

They write a recap of the past instead of a handoff for the next turn.

For agents, a better compact template looks like this:

User Goal:
[What the user originally wanted]

Hard Constraints:
[Rules that must not be violated, explicit user requirements, permission boundaries]

Current State:
[Where the task is actually stuck right now, not a vague recap]

Key Facts:
[Facts confirmed from files, logs, or tool results, ideally with sources]

Files Read:
[Path + key takeaways + whether the content may now be stale]

Files Modified:
[Path + what changed + why]

Approaches Tried But Failed:
[So the next turn does not repeat the same mistakes]

Latest Verification Results:
[Command, result, failure message, or proof of success]

Next Step:
[What should happen first after decompression]

The point of this template is resumability.

History alone is not enough. The agent must know where to pick the task up again.

11. What Questions Should Drive Your Architecture Choice?

If you are designing your own agent system, do not start with "which framework is strongest?"

Start with questions like these:

Is my agent for low-latency completion or long-running task execution?
Can the task state be structured?
Will tool results be very large?
Do I need cross-session memory?
Is there multi-agent collaboration?
Do I need enterprise permissions and audit?
Should the model be allowed to see internal runtime state?
Do failures need to be recoverable?
What is the one thing I can least afford to lose after compression?

Different answers lead to different design priorities:

IDE completion systems should prioritize local code context and low-latency indexing.
Workflow systems should prioritize state, checkpoints, and resumable execution.
Application SDKs should prioritize separating local context from model-visible context.
Coding CLI agents should prioritize tool-result governance, compaction, and recent-tail continuity.
Multi-agent systems should prioritize boundaries, roles, handoffs, and structured artifacts.
Long-lived personal assistants should prioritize layered memory, skill accumulation, and expiration policy.
Enterprise systems should build permissions, approvals, secrets, and audit directly into the context architecture.

This gets much closer to engineering reality than comparing models in the abstract.

12. One-Sentence Summary

If you compress this whole chapter into one sentence, it becomes:

Context management is not about stuffing more content into the model. It is about continuously deciding, within a finite window, what the model should see, in what form, at what time, how to compress it when space runs out, and how to recover when the task is interrupted.

Compressed even further, it becomes six verbs:

Select
Inject
Recall
Compress
Isolate
Recover

Claude Code, LangGraph, OpenAI Agents SDK, AutoGen, Cursor, Hermes, and OpenClaw all look very different on the surface. But underneath, they are all answering the same question:

When the model has no real memory,
and the task still has to move forward continuously,
how does the outer system manage the world the model gets to see on this turn?

That is the real value of context management.

It is not a side feature of an agent. It is one of the core capabilities of the agent harness.

Claude Code Source Analysis Series, Chapter 4: Context Management

LienJack — Sun, 10 May 2026 08:05:41 +0000

Chapter 4 of the Claude Code Source Analysis Series | Context Management

In the previous article, we looked at Claude Code's prompt runtime: on every turn, the outer system rebuilds the model request from system rules, project memory, dynamic context, tool descriptions, message history, and prior tool results.

This chapter asks the next question:

Once all of that keeps accumulating, how does Claude Code decide what to keep, what to compress, and what to leave out?

People often reduce context management to a single idea:

Just stuff more history into the model.

That is only half true.

In a normal chat app, context does look a lot like chat history. But Claude Code is a coding agent, so its context is closer to a dynamic workbench. What the model sees on any given turn is not just "what the user said before." It may include system rules, project conventions, tool descriptions, file reads, shell output, error logs, the file that was just edited, task progress, and compressed summaries of earlier work.

So the real question is not whether context should exist. The real question is this:

How does an agent that keeps reading files, running commands, and editing code decide what the model should see on this turn? How does it preserve older information? How does it compress when space gets tight?

If the previous article was about assembling an operating manual for the model, this one is about something even more fundamental:

The model's workbench has limited space.
Claude Code has to keep reorganizing that workbench while the task is still in motion.

We will keep using the same running example as the rest of this series:

The user says: the tests in this project are failing. Find the cause and fix them.

That sounds short. For Claude Code, it quickly unfolds into a much longer chain:

Inspect the project structure
-> Read package.json
-> Run the test command
-> Analyze the failure
-> Search the relevant code
-> Read the target file
-> Edit the code
-> Run the tests again
-> Summarize the result

Each step creates more context. Context management exists to make sure that, over a long task, this information does not break continuity and does not drown the model.

1. Context Is Not Just Text History. It Is a Workbench Rebuilt Every Turn

The most important fact to carry over from the previous chapter is that the model itself is stateless from one call to the next.

Claude Code only appears continuous because the outer harness rebuilds the right working scene on every turn and sends that reconstructed scene back to the model.

So a model request is not really:

user question -> model

It is much closer to:

system rules
+ project rules
+ user preferences
+ current tool descriptions
+ message history
+ tool results
+ compressed summaries
+ current user input
=> this turn's model request

That is why context management should not be understood as "saving the chat log." A better name would be:

context orchestration

It has to answer a series of concrete questions:

Which information must survive every turn?
Which information should stay in the runtime and never be exposed to the model?
Which pieces can be cached?
Which pieces should only be fetched again on demand?
Which old tool results are already stale?
Which parts of history must be compressed into summaries?
After compression, how does the model still know what it is currently doing?

Without this layer, the ReAct-style main loop quickly runs into two opposite failures:

Too little context: the model loses track of what already happened.
Too much context: token usage, cost, latency, and attention all spiral out of control.

Context management is the balancing act between those two failures.

2. Why Does a Coding Agent's Token Usage Spike So Quickly?

A normal chat turn might cost a few hundred tokens.

A coding agent is different. Every action it takes pulls real environment output back into the conversation. A 500-line source file can cost thousands of tokens. One failed test can return a long stack trace. A global search can produce dozens of matches.

Worse, you usually cannot discard that information immediately. On the next turn, the model still needs to know things like:

Which file was just read?
Which line triggered the error?
What fixes have already been tried?
Which command failed?
Did the user warn us not to touch a certain kind of file?

So an agent's context does not grow like "a few more messages." Tool calls keep injecting environment state into the message stream.

That creates three classic failure modes.

1. Token Explosion

Tool outputs pile up until the next request exceeds the model's context window. At that point this is not just lower answer quality. The run can stall outright.

2. Context Pollution

Old file contents, outdated command output, and stale error logs remain in history. The model may treat obsolete information as current truth. A file has already been changed, but the context still contains the old version, so the model keeps reasoning from stale code.

3. Compression Amnesia

Compress too aggressively, and the model forgets the user's original goal, the current stage of the task, or something that just happened a moment ago. This is the most frustrating failure mode: the system still looks active, but its direction has quietly drifted off course.

That is why Claude Code's context management is not "summarize when full." It is continuous capacity governance running inside the main loop.

In engineering terms, this behaves more like a resident GC worker than a full GC that only runs after the heap is exhausted.

3. Put Context Back Inside the QueryEngine Main Loop

Claude Code's main execution loop is not just:

request model
-> call tool
-> request model again

Each turn actually passes through a context-governance layer:

prefetch project and session information
-> assemble this turn's context
-> request the model
-> decide whether the model is answering or asking for a tool
-> execute the tool and write the result back into message history
-> check token pressure
-> trim, collapse, or compress when needed
-> carry the new state into the next turn

Visually, it looks like this:

The most important segment is H -> I -> J -> B.

Tool results are not merely UI logs. They are raw material for the next round of reasoning. Every file read, shell command, and code search writes a result back into message history, and that history is then re-evaluated to decide what can still fit into the next model request.

Context management is not a helper living off to the side of QueryEngine. It is part of what makes the loop runnable at all.

4. Governance Comes Before Compression

When people hear "context management," they often jump straight to compression. But a mature agent cannot just compress.

It first has to answer at least four classes of information-governance questions.

First, visibility:

Should this information go to the model, or should it stay inside the runtime?

API keys, permission objects, and internal traces should not enter the prompt. Large files and giant logs do not always need to be passed through verbatim either. Sometimes a reference or a summary is enough.

Second, authority:

If system rules, project rules, user instructions, and long-term memory conflict, which one wins?

If the project rules say "do not edit generated files" but the user asks for exactly that, the system cannot leave the decision to the model's intuition alone.

Third, hot / warm / cold tiering:

What is hot context and needed right now?
What is warm context and might be needed soon?
What should stay outside the prompt until it is recalled on demand?

The log from the currently failing test is hot. An old error from two hours ago that has already been resolved is warm. The full transcript is cold. You cannot push all of it into every turn.

Fourth, shape transformation:

Should this information exist as raw text, a summary, structured state, a diff, or a reference?

A failing test log can remain in raw form, or it can be normalized into something like:

Command: pnpm test auth
Status: failed
Key error: TypeError: user.id should be string
Relevant file: src/auth/session.ts
Next step: inspect the mock user construction logic

Those two forms consume very different numbers of tokens, and they help the model in different ways.

Context management is information governance. Compression is only one action inside that broader system.

5. Claude Code's Compression Is a Layered Defense, Not a Blunt Instrument

If you read the source, Claude Code's compression pipeline is easiest to understand as a staged defense that escalates from light to heavy.

It does not begin by turning all old history into one paragraph. It starts with low-risk, low-loss local cleanup. If that is not enough, it escalates toward folded views and only later toward full summarization.

The philosophy is simple:

If local slimming is enough, do not jump to global summarization. If structure can be preserved, do not collapse everything into a paragraph. Lossy folding should be the last resort.

Let's look at the layers one by one.

1. Tool Result Budget: Cap the loudest noise source first

The first thing that usually needs control is not the user's message. It is the tool output.

Bash can return thousands of log lines
Read can return a large file
Grep can return dozens of matching blocks
WebFetch can pull in a full webpage

If those are passed into the next turn unchanged, the window fills quickly. The role of applyToolResultBudget is to cap oversized individual tool results before heavier compression starts.

In one sentence:

Do not let one tool result consume the entire workbench.

2. Snip: Remove low-value bulk without breaking the structure

snip works like local surgery.

It does not remove entire turns. Instead, it replaces large low-value blocks with markers or shorter representations while preserving the structure of the message chain.

Why not simply delete them? Because message history carries tool call IDs, tool_result pairings, and cross-turn references. Deleting a message outright can break continuity. Replacing it with a marker frees space while preserving the fact that "a tool result used to be here."

In one sentence:

The content gets shorter, but the ledger stays intact.

3. MicroCompact: Clean stale tool results without destroying the task's structure

MicroCompact is a more systematic local cleanup pass.

It mainly targets tool outputs that are large, time-sensitive, and already superseded by later work, such as:

old file read results
old search results
old command output
old webpage or external-query results

It usually leaves these alone:

original user messages
key assistant responses
recent tool results
currently active context

For example, suppose the agent reads src/auth/session.ts, later edits that file, and then reads the new version. The first read is now stale. Keeping it in full wastes space and can also mislead the model.

In one sentence:

Take out the trash, but keep the ledger.

4. Context Collapse: Fold the view before you rush to summarize

Context Collapse is a smarter intermediate layer.

The goal is not just to delete history. It is to project a more compact view of the context. If that folded view drops the request back below the safety threshold, there is no need to trigger the more expensive AutoCompact.

This reflects an important engineering tradeoff in Claude Code:

If fine-grained structure can be preserved, do not rush to turn the whole history into one large summary.

Full summarization saves space, but it always loses detail. Collapse is more like grouping, folding, and stowing documents on a desk, not burning them all and keeping only a meeting note.

5. AutoCompact: At the end of the line, turn history into a handoff note

Only after the lighter local defenses fail does automatic summary compression begin.

But the summary cannot be something vague like this:

We discussed the failing tests, read some files, and changed some code.

That is useless if the task has to continue.

A good compact summary should read like a task handoff note, preserving at least:

the user's main request
key constraints
files touched
important facts discovered
errors encountered
fixes already attempted
what is currently in progress
what should happen next

The last two are especially important:

what is currently in progress
what should happen next

Many weak summaries record what happened but not where the task currently stands. After compression, the model remembers the rough story but does not know where to resume.

So the essence of AutoCompact is not "write a summary."

It is this:

Turn the conversation into a handoff note that the next turn can keep executing from.

6. Reactive Compact: The recovery path after the model says it is full

Even with proactive budgeting and automatic compression, reality can still surprise the system.

The model API may return a context-too-large error. Media may exceed limits. Token estimates may not match the actual encoding exactly. That is where reactive compaction enters. It is not proactive prevention. It is recovery after a budget failure.

Its existence points to one core lesson:

A long-running agent cannot assume its budget estimate is always perfect. It needs a recovery path for when the estimate is wrong.

This is the same engineering instinct you see in retry and recovery logic elsewhere: do not assume the system will never fail; make sure it can recover when it does.

6. Why Keep the Recent Tail After Compression?

The most common compression failure is not total forgetting. It is losing the feel of the live scene.

The last few turns before compression might look like this:

Just edited src/auth/session.ts
Just ran pnpm test auth
Just saw a new TypeError
The user just added: do not change the public API

These details are closest to the current action, and they are often the most important. If they all get folded into a summary, the next turn feels distant, like reading meeting minutes without having been in the room.

So the better pattern is not:

old history -> one summary -> continue

It is:

old history -> one summary
+ the last few raw turns
+ key recent tool outputs
-> continue

The underlying idea is simple: keep the recent tail and reconnect the summary to the live scene.

The summary preserves the long-term storyline. The tail preserves the current feel of the work.

One of the most important lessons from long-running agents is this:

Compression is not only about remembering the past. It is also about staying grounded in the present.

7. Do Not Confuse Context, Memory, and Transcript

At this point it becomes easy to blur three different ideas together: Context, Memory, and Transcript.

They are not the same thing.

Concept	Plain-English Analogy	Role in Claude Code
Context	Active workbench	What the model can actually see on this turn; rebuilt for each request
Memory	Reusable notes	Project rules, user preferences, and key session facts that are loaded before entering context
Transcript	Full archive	The raw event log used for recovery, audit, and replay; too large to include verbatim on every turn

The shortest way to remember them is:

Context: active workbench
Memory: reusable notes
Transcript: full archive

Claude Code's context-management logic is fundamentally about moving information between these three layers:

Preserve the full history in the transcript
Extract key facts into memory
Pack what matters most right now into context

Treat the transcript like context, and every turn explodes in token usage.

Treat context like memory, and temporary task details pollute long-term rules.

Treat memory like transcript, and you lose the detailed record of what really happened.

Once these boundaries are clear, a lot of agent "amnesia" becomes much easier to explain.

8. What Claude Code Achieves Through the Seven-Dimension Lens

One useful way to evaluate Claude Code is a seven-dimension context model: Visibility, Authority, Temperature, Shape, Retrieval, Compression, and Boundary.

Seen through that lens, the system looks like this:

Dimension	Claude Code mechanism	Strength	Current limitation
Visibility	`system prompt`, user context, `toolUseContext`, tool-result budgeting, `snip`, and collapse jointly decide what enters the model and what stays in the runtime	Strong	Not all information is abstracted behind one unified `ContextItem`; visibility logic is still scattered across modules
Authority	system-priority rules, project rules, current user instructions, permission rules, and security policy together form a decision chain	Strong	Conflict handling still relies on cooperation between prompts and runtime rules rather than a single explicit authority resolver
Temperature	the recent message tail, current tool output, session memory, and transcript / resume state behave like hot, warm, and cold layers	Moderately strong	The behavior exists, but the source may not always name the layers explicitly
Shape	raw tool results, truncation markers, summary messages, boundary messages, diffs, and structured `tool_result` payloads coexist	Strong	More task state could be lifted into explicit structure instead of living mostly in natural-language history
Retrieval	`CLAUDE.md` loading, git status, `Read`, `Grep`, `Glob`, web tools, MCP, and skills pull information in on demand	Moderately strong	The design leans more on tools and files than on a unified retrieval substrate
Compression	Tool Result Budget, `Snip`, `MicroCompact`, `Context Collapse`, `AutoCompact`, and reactive compaction form a multi-layer defense	Very strong	Summary drift is still a real risk, so constraints, source scope, and the recent tail must be preserved carefully
Boundary	permission checks, Plan Mode, tool protocols, sub-agent forks, MCP boundaries, hooks, and sandboxing isolate actions and information	Strong	Enterprise-grade tenancy and data isolation still depend on deployment environment, not just the context layer itself

What stands out most is this:

Claude Code is strongest in compression and boundaries, highly engineered in shape and retrieval, and still has room to keep abstracting visibility, authority, and temperature.

Put differently, this is no longer just "a CLI that compresses chat history." Context governance is woven through QueryEngine, the prompt runtime, the tool system, the permission system, the compaction system, and the session-resume path. Together they form a full harness.

But it is also not a textbook standalone ContextManager. Many of these capabilities are distributed across the main loop and supporting runtime subsystems rather than centralized in one class.

That is exactly what readers can miss when they first inspect the source:

Do not go hunting for a file literally named ContextManager. Context management is a cross-cutting engineering pipeline running through the loop.

9. Which Objects Matter Most When You Read the Source?

Do not start by reading files in isolation just because their filenames look relevant. A better method is to trace the context lifecycle:

Where does information enter?
What state does it become?
What rules filter it?
When does compression trigger?
Where is the compressed result written back?
How does the next model turn see it again?

These are good places to start:

Object to inspect	Main question
Query loop / `query.ts`	Where in the main loop does context governance happen?
Context builder / prompt runtime	What pieces make up the model input for this turn?
`CLAUDE.md` loader	How do project rules and user memory enter the context?
Message store / messages	How do user messages, model replies, and tool results accumulate?
Token budget / tracking	At what point does the system consider the request unsafe in size?
Tool Result Budget / `Snip`	Which tool outputs get trimmed first?
`MicroCompact`	Which stale tool outputs can be cleaned away?
Context Collapse	How does the system fold the view before it jumps to full summarization?
`AutoCompact` / reactive compact	How is history replaced with structured summaries, and what fallback exists if compression fails?
Transcript / resume	How is raw history backed up and later restored?

When you read these objects, do not only ask what the function does. Ask where it sits inside the loop.

Viewed alone, TokenBudget can look like a simple length-calculation helper. Placed back inside QueryEngine, it becomes the switch that moves the system from normal execution into compression governance.

Viewed alone, MicroCompact can look like message cleanup. Placed back inside a long-running task, it becomes what prevents stale tool output from continuously polluting the next round of judgment.

Viewed alone, AutoCompact can look like summarization. Placed back inside an agent session, it is writing the handoff note that lets the next model turn keep working.

One strong habit for reading agent source code is:

Do not just ask what a function is. Ask what kind of runaway behavior in the loop it is there to prevent.

If you trace one request round through query.ts, the context-management path compresses into something like this:

getMessagesAfterCompactBoundary()
-> applyToolResultBudget()
-> snipCompactIfNeeded()
-> microcompact()
-> contextCollapse.applyCollapsesIfNeeded()
-> autoCompactIfNeeded()
-> appendSystemContext()
-> queryModelWithStreaming()

That line makes something very clear: context management is not a rescue step after an API error. It is proactive governance that runs before every model request.

Inside that chain, applyToolResultBudget() handles the noisiest source first: tool output. A shell log, a large file read, or one MCP response can fill the window faster than the message history itself. So Claude Code first applies a local budget and only then considers global compression.

microcompact and contextCollapse form the middle layer. They try to project, fold, and clean up local history so that the system does not degrade too quickly into "one giant summary." That matters because programming tasks need structure preserved: which tool call produced which result, which file was read, which error is still unresolved.

autoCompactIfNeeded() is the heavier step. It reserves space for summary output before the context is completely exhausted. If compression fails, it also needs a circuit breaker so the system does not keep triggering the same unrecoverable compression request on every turn.

Inside compact.ts, pay attention to the rebuild logic after compression as well. Compression is not finished when old history becomes a paragraph. The system also has to preserve the compact boundary, summary messages, the recent tail, attachments, hook results, and sometimes even recently accessed key files. Otherwise the next turn sees only the summary and loses its current working scene.

Another easy detail to miss is that some context management hides inside individual tools. For example, if the file-read tool sees that the same file and the same range were already read and the file has not changed, it can return file_unchanged instead of shoving the full content into messages again. That small optimization is really a way to prevent duplicate context pollution.

So when you map this chapter back to the code, do not go looking for a single class called ContextManager. Follow the lifecycle instead: how information enters, gets budgeted, gets collapsed, gets compressed, and then gets restored.

10. If You Were Building a Minimal Context Manager Yourself, Where Would You Start?

If you want to build a "mini Claude Code," you do not need to reproduce the full six-layer compaction pipeline on day one.

A minimal version can start here:

1. Append user messages, assistant replies, and tool results into one message store
2. Estimate token usage before each model request
3. Trim oversized tool results first when they cross a threshold
4. Keep the last N turns in raw form
5. Compress older history into a structured summary
6. Persist the raw transcript to disk for recovery
7. Force the summary to retain: user goal, constraints, files changed, failed attempts, and next step

That already solves the problem that causes many demo agents to lose coherence after just a few turns.

Once that works, you can gradually add:

loading project rules and user memory
different budgets for different tool types
reference-based file handling plus re-read on demand
collapsed views instead of immediate full summarization
sub-agent or fork isolation for long search tasks
permission-aware context injection
a context-plan log explaining why this turn included these specific pieces of information

This is a more stable evolution path than simply starting with a model that has an enormous context window.

Larger windows solve a capacity problem. They do not automatically solve an information-discipline problem. The real challenge for an agent is this:

As the information keeps growing,
can the system keep showing the model the small subset that matters most right now?

11. One-Sentence Summary

If you compress this whole chapter into one sentence, it becomes:

Claude Code's context management is not about feeding the model more and more history. It is about continuously assembling, budgeting, pruning, folding, summarizing, and reconnecting information inside a finite token budget.

If you compress it even further, it turns into six verbs:

Assemble: decide what the model should see this turn
Budget: detect when tokens are entering the danger zone
Prune: trim oversized tool outputs first
Fold: preserve fine-grained structure where possible
Summarize: produce a handoff note the next turn can continue from
Reconnect: keep the recent tail attached to the live working state

In the end, context management determines something very practical:

After turn 20,
does the agent still behave like someone who has been working continuously,
or like someone who just woke up and only read the meeting notes?

That is one of the key mechanisms that turns Claude Code from "a chat box with tools" into "an engineering agent that can sustain long-running work."

Claude Code Source Analysis Series, Chapter 3: Prompt Construction

LienJack — Sun, 10 May 2026 07:59:44 +0000

Chapter 3 of the Claude Code Source Analysis Series — Prompt Construction

Claude Code does not run on a single static prompt. Before every model call, it rebuilds a working context from system rules, project memory, runtime state, tool descriptions, message history, and the user's latest input.

That leads straight to a new question:

Before each round of model invocation, what exactly does Claude Code show the model?

When people first build an Agent, they often start from a very natural assumption:

All I need is a sufficiently strong system prompt.
I'll tell the model it's a programming assistant,
that it should follow the rules and can call tools.
Wouldn't that basically give me Claude Code?

That assumption isn't wrong, but it only scratches the surface.

The real Claude Code does not rely on a single fixed prompt at all. Before each round of calling the model, it assembles a fresh batch of information from scratch: base identity, system rules, the current mode, project memory, user preferences, Git status, tool descriptions, skill descriptions, MCP capabilities, message history, tool results, compressed summaries — plus the user's latest input.

So this chapter is not answering:

What does Claude Code's prompt say?

It is answering:

How does Claude Code stitch together context from multiple sources at runtime into an input that the model can understand, act on, and stay within bounds?

In one sentence:

Claude Code's prompt is not a static template. It is a runtime assembly process: it selects system prompts by priority, loads memory by layer, injects dynamic context each turn, and feeds tool results back into the next round's messages.

Take a look at this diagram first to build a mental model:

What this diagram shows is that what looks like a single prompt is actually split into a stable segment, a dynamic segment, a memory segment, the current user message, and message history. What this runtime manages is not just wording. It manages which source overrides which, what enters context first, what can be cached, and what must be refreshed every turn.

1. Why can't you just write one big prompt?

Think about a real scenario.

A user types this from the project root:

Help me figure out why this project's tests are failing and fix them.

With a single generic prompt, the model knows at most:

You are a programming assistant.
Help the user fix code.
Keep answers concise.

Not even close.

What the model actually needs to know:

Which project is this?
What's the current directory?
Does the project have its own development conventions?
Does the user have personal preferences?
Are there uncommitted changes in the working tree right now?
Which tools are available?
Which commands require confirmation?
Which files have already been read and which tests run in previous turns?
If the context gets too long, which history has been compressed into summaries?

This information isn't hardcoded into a template — it's only available at runtime.

Git status changes. Today's date changes. Tool call results change. User input changes. The CLAUDE.md in the current directory might be completely different from another project's.

So the challenge Claude Code faces isn't "how to write a universal prompt" — it's a more engineering-driven question:

Before each model call, exactly which information should go in? In what order? If information conflicts, which source wins? If it's too long, what gets dropped first? If something can be cached, where do you draw the boundary?

That's why Prompt Runtime exists.

2. A Single Model Turn Contains More Than Just the System Prompt

Let's disentangle a few concepts first, so they don't blur together later.

We habitually call everything we feed the model a "prompt." But in an agent system like Claude Code, a single model turn breaks down into at least these categories:

system prompt       System-level behavioral rules
system context      System environment context, e.g. Git status
user context        User and project context, e.g. date, CLAUDE.md
messages            User messages, model responses, tool calls, tool results
toolUseContext      Currently available tools, their schemas, permissions, and execution context

Each of these has a distinct job.

The system prompt is the model's operating manual — it says "who you are, how you work, what your boundaries are."

The user context provides long-term constraints from the user and the project: "this project uses pnpm," "write commit messages in Chinese," "don't directly edit generated files."

The system context is runtime intelligence: the current Git branch, working-tree changes, latest commit, current username.

The messages are the live ledger — a record of everything that has happened so far in this task: what the user said, what tools the model invoked, what results came back.

The toolUseContext tells the model which "hands and feet" it has available this turn: Read, Edit, Bash, Grep, Task, MCP tools, Skill tools, and so on.

So Claude Code's assembly logic can be simplified to this:

Stable system rules
+ current runtime environment
+ user / project memory
+ tool capability descriptions
+ historical messages and tool results
+ current user input
=> this turn's model request

Fixating solely on "how well-written the prompt text is" misses the point. What really determines whether an agent is reliable is how this information gets organized, overlaid, cached, and compressed. You can craft a gorgeous system prompt, but if the tool results aren't correctly backfilled, the model will still lose its thread.

3. First Layer: The System Prompt Is a Priority Decision, Not Simple Concatenation

First, let's look at the system prompt priority. Abstracted, it forms this selection chain:

0. overrideSystemPrompt   Completely replaces all prompts, e.g. loop mode
1. Coordinator prompt     Used when coordinator mode is active
2. Agent prompt           Prompt defined by a custom Agent
3. customSystemPrompt     Specified via --system-prompt
4. defaultSystemPrompt    The standard default Claude Code prompt
5. appendSystemPrompt     Always appended at the end

Let's unpack these names individually. They are not configuration items of the same provenance — they are the "entry points" Claude Code reserves when assembling the effective system prompt:

Name	What it means	Where it comes from
`overrideSystemPrompt`	The highest-priority internal override. When present, it bypasses the default prompt, custom prompt, and Agent prompt selection logic entirely, directly replacing the main system prompt with this content.	Not a regular user-facing CLI argument. It typically comes from higher-level internal callers — for example, certain loop / fork / background task modes pass a pre-rendered system prompt when invoking an Agent or QueryEngine. It solves the problem of "the current task must run under a different set of rules."
`Coordinator prompt`	The coordinator's own system prompt. It defines the model as a multi-Agent orchestrator whose core responsibilities are decomposing tasks, assigning workers, collecting results, and synthesizing judgments — rather than editing files or executing all tools directly.	Originates from Coordinator-mode modules. It is only used when coordinator mode is activated by feature flag and runtime configuration. This mode also changes available tools and worker descriptions accordingly.
`Agent prompt`	The exclusive system prompt for a sub-Agent or custom Agent. It defines what this Agent is suited for, which tools it can use, and what form its output should take.	Comes from the Agent definition. Built-in Agents generate it dynamically via `getSystemPrompt()`; custom Agents typically come from user-authored Agent Markdown / JSON definitions, where the body becomes that Agent's system prompt, optionally augmented with Agent memory-related prompts.
`customSystemPrompt`	The user-explicitly-specified "replacement for the default system prompt." It is not supplementary notes — it replaces the default Claude Code prompt with user-provided content.	Comes from the CLI / SDK entry point, e.g. `--system-prompt`, represented in `QueryEngineConfig` as `customSystemPrompt?: string`.
`defaultSystemPrompt`	The standard system prompt for a normal Claude Code session. It defines the default persona, collaboration style, tool-use principles, safety boundaries, code-task handling approach, and so on.	Comes from Claude Code's built-in prompt construction functions. The main flow calls logic like `fetchSystemPromptParts()` / `getSystemPrompt()` to obtain the default system prompt fragments.
`appendSystemPrompt`	Supplementary constraints that are always appended at the end of the main prompt. It does not change the model's persona, only adds an extra block of rules after the already-selected main system prompt.	Comes from the CLI / SDK entry point, e.g. `--append-system-prompt`, and may also be auto-appended by certain internal modes with additional notes. Placed at the end of the system prompt array during assembly.

So these "sources" fall roughly into three categories:

Built-in:       defaultSystemPrompt / Coordinator prompt / built-in Agent prompts
User-configured: customSystemPrompt / appendSystemPrompt
Internal runtime: overrideSystemPrompt / certain auto-appended appendSystemPrompt values

Looking at the source-level flow makes this even clearer. In the normal main flow, QueryEngine.submitMessage() first obtains the default prompt, user context, and system context, then assembles them roughly like this:

const systemPrompt = asSystemPrompt([
  ...(customPrompt !== undefined ? [customPrompt] : defaultSystemPrompt),
  ...(memoryMechanicsPrompt ? [memoryMechanicsPrompt] : []),
  ...(appendSystemPrompt ? [appendSystemPrompt] : []),
])

This logic expresses the most basic substitution relationship: if customSystemPrompt is present, use it to replace defaultSystemPrompt; if not, use the default prompt; finally, append appendSystemPrompt.

4. Second Layer: `CLAUDE.md` Is Project Memory, Not an Ordinary README

The system prompt answers "how should the model behave by default," but it doesn't yet know the rules of the current project.

That's where CLAUDE.md comes in.

Think of CLAUDE.md as Claude Code's project work specification. It's not a README written for humans — it's an operating manual written for an Agent:

How do you start this project?
What's the test command?
What code style does it follow?
Which directories should never be touched?
How should PR descriptions be written?
What should you watch out for when touching database migrations?

The CLAUDE.md loading order can be drawn as a memory hierarchy:

1. Managed memory  /etc/claude-code/CLAUDE.md
2. User memory     ~/.claude/CLAUDE.md
3. Project memory  CLAUDE.md, .claude/CLAUDE.md, .claude/rules/*.md
4. Local memory    CLAUDE.local.md

Each tier has a distinct purpose.

Managed memory holds organization- or admin-level rules. This is where company-wide constraints go — security policies, code review requirements, production environment prohibitions.

User memory stores the user's own long-term preferences. Maybe you prefer explanations in Chinese, have a preferred testing style, or want commit messages to follow a particular format.

Project memory contains project-level rules, usually maintained alongside the repository. It tells the Agent how this specific project is built, its directory conventions, and the boundaries of its tech stack.

Local memory holds private, local rules that typically aren't checked into version control. This is for information that only applies to your machine — local service ports, private paths, temporary debugging habits.

This hierarchy isn't complexity for complexity's sake. It solves a very practical problem:

The Agent must obey organization rules, respect user preferences, adapt to the current project, and never leak local private configuration to the team.

That's also why the memory layers need to be separate: if organization rules, user preferences, project norms, and local private config are all jumbled together, it becomes impossible to decide which one wins when they conflict.

With a single flat CLAUDE.md, all this information would bleed together. Organization rules, personal habits, project conventions, and local ad-hoc settings get dumped into one file, and when conflicts arise, there's no clear priority.

By layering the memory system, Claude Code turns it into a governable stack of rules.

Why does `CLAUDE.md` end up in the prompt?

Because the model simply doesn't know your project's conventions.

Say the project contains:

- Use pnpm, never npm.
- After editing TypeScript files, always run pnpm typecheck.
- Never manually edit the generated/ directory.

If these rules don't make it into context, the model will very naturally run:

npm test

or directly modify generated files.

This isn't the model being "stupid" — it just hasn't seen the project rules.

The value of CLAUDE.md is that it loads project knowledge into the model's visible working memory, so it operates the way the current repo expects from the very start.

There's a boundary to keep in mind, though: more CLAUDE.md is not always better.

If you stuff tens of thousands of words of historical explanations into it, the model gets drowned in noise and costs go up. That's why Claude Code applies size limits, caching, and selective loading to memory content. More advanced sub-agents may also choose omitClaudeMd, skipping project memory in certain read-only search or planning scenarios to reduce token cost and attention noise.

This reveals the core idea — CLAUDE.md isn't about "always shove the whole thing into the prompt no matter what." It's about:

Injecting the right tier of project memory into the model for the right task.

5. Third Layer: Dynamic Context Is Re-evaluated Every Turn

By this point, we have the system prompt and project memory. But Claude Code still needs runtime intelligence.

Typical dynamic context includes:

Current date
Current working directory
Current Git branch
git status
Most recent commit
Current username
Tools available this turn
Tools exposed by MCP servers
Discovered Skills
Permission mode
Compression summary

None of this belongs in CLAUDE.md.

Git status changes constantly, the date advances daily, the tool list shifts with MCP connections, and Skills can be hot-reloaded at runtime.

So Claude Code generates context at runtime through paths like getSystemContext() and getUserContext().

A rough sketch:

getSystemContext()
-> reads Git status, branch, recent commits, environment info
-> produces system-level context

getUserContext()
-> reads date, CLAUDE.md, user / project memory
-> produces user-level context

This has two benefits.

First, dynamic information does not pollute the static prompt.

defaultSystemPrompt stays stable; high-churn information like Git status is injected separately. That makes caching easier and makes it simpler to detect what actually changed.

Second, the model sees the most relevant current information on every turn.

On the first turn, the model hasn't read any files yet and needs more global rules and tool descriptions. By the fifth turn, it already has test errors and the relevant source code — at that point the message history and tool results are what matter. If context compression occurs, old history gets replaced by a summary, and the model sees the compressed state on the next turn.

So Claude Code's context assembly is not a one-time event — it's a continuous action inside the loop:

User input
-> build context for this turn
-> call the model
-> model requests tools
-> tool results written back into messages
-> check whether compression is needed
-> rebuild context for the next turn

This connects directly to the ReAct loop: prompt assembly does not sit outside the agent runtime. It sits at the entrance to every turn.

6. Caching: Why Separate Stable and Dynamic Segments?

This comes down to a very practical concern: an agent calls the model frequently, and if every turn has to recompute, re-bill, and re-process the same massive block of system prompts, the cost and latency both spike.

So Claude Code tries to keep stable content at the front, making it easier to hit the prompt cache.

Stable segments typically include:

Identity introduction
System rules
Task execution guidelines
Operational safety guidelines
Tool usage guidelines
Tone and style
Output efficiency requirements

These generally stay the same across a session and are well-suited for caching.

Dynamic segments include:

Agent tool context
Skills context
CLAUDE.md loading results
MCP server directives
Git status
Current date

These change more frequently and need to be separated from the stable segments.

The cache boundary can be drawn like this:

The source mentions the SYSTEM_PROMPT_DYNAMIC_BOUNDARY design. In plain terms:

Keep everything above this line as stable as possible for caching;
everything below this line may change and is refreshed per turn.

This boundary is critical.

If you mix highly volatile information into the stable segment — say, stuffing a dynamic skill list directly into tool descriptions — every change to that skill list invalidates the entire system prompt cache. What looks like a small list update can actually force every turn's request to re-process thousands or even tens of thousands of tokens.

That's why the prompt runtime isn't just about "giving the model more to know" — it also has to control costs:

Keep stable content as stable as possible
Isolate dynamic content separately
Have a clear boundary for cache invalidation

This is also what separates Claude Code from a toy agent. A toy agent only cares about "does it run?" A mature agent also cares about "after 20 turns, 50 turns, 100 turns — are the cost and latency still acceptable?"

In complex tasks, the gap between cache hits and cache misses gets amplified across multiple turns. What the user perceives isn't a minor optimization — it's whether the entire task feels smooth or not.

7. User Prompt: The Current Question Is Just the Final Puzzle Piece

We've covered the system prompt, CLAUDE.md, and dynamic context. Now let's look at user input.

User input certainly matters, but it doesn't reach the model in isolation.

Say the user types:

Fix the tests.

This sentence is very short on its own. Sent to the model in isolation, it carries almost no actionable information.

But inside Claude Code, it arrives alongside all the context that came before it:

System rules: You are Claude Code. Modify files with care.
Project memory: This project uses pnpm; the test command is pnpm test.
Git status: The current branch has 3 uncommitted files.
Tool descriptions: Read, Grep, Bash, Edit are available.
Message history: The user just mentioned a failing login test.
Current input: Fix the tests.

What the model understands is no longer the bare phrase "Fix the tests." It becomes:

On top of the current repo, the current rules, the current tool boundaries,
and the current task history, carry forward the engineering action
of "fixing the tests."

The user prompt is more like the final trigger. It tells the Agent what to do now, but whether the Agent can do it correctly depends on whether the context assembly that came before it is complete.

8. Why do tool results count as part of prompt assembly?

This is easy to overlook.

We usually think of a prompt as "the content written before sending it to the model." But in an agent loop, tool results become part of the next round's model input.

For example, in the first round the model decides:

I want to read package.json.

Claude Code executes the Read tool and gets back the file contents.

If the tool result isn't written back into messages, the model in the next round still has no idea what's inside package.json.

So tool-result write-back is a critical step in prompt construction:

The model expresses an intention to act
-> The tool system executes it
-> The tool result becomes a message
-> Carried into the model during the next round of context assembly

This step translates facts from the external world back into context the model can read.

In other words, Claude Code does not assemble prompt context once at input time. It keeps growing, trimming, and rebuilding it after every new observation.

That's why the Agent can work across multiple rounds.

9. Connecting the Entire Chain

Now we can distill Claude Code's prompt assembly into a single chain:

There are three key takeaways from this diagram.

First, system prompts have priority — not all sources are simply concatenated flat.

Second, information like CLAUDE.md, Git status, the current date, and tool context is not static templating; it's runtime context.

Third, tool results and compressed summaries feed back into the next turn's input — prompt construction spans the entire agent loop.

10. Tying It All Together with a Complete Example

Suppose the user types:

Fix the login test for me.

What Claude Code actually feeds the model is not that single line — it's an entire operating snapshot.

Step 1: Select the system prompt.

A normal conversation uses the default Claude Code behavior rules; a sub-agent gets an agent-specific prompt; coordinator mode uses the Coordinator prompt; if an override is present, it replaces everything.

Step 2: Load memory.

The system might read:

User-level: Reply in Chinese.
Project-level: This project uses pnpm.
Project-level: The test command is pnpm test -- --runInBand.
Project-level: Do not modify generated/ directly.
Local-level: The local backend service port is 4000.

Step 3: Inject dynamic context.

The system might supplement:

Current branch: feature/login-test
Git status: src/auth/login.ts has unstaged changes
Recent commit: fix auth redirect
Current date: 2026-05-02

Step 4: Prepare tool context.

The model sees what it can use:

Read / Grep / Bash / Edit / Task ...

It also learns which tools require permission and which operations it cannot perform directly.

Step 5: Merge message history with the current user input.

If the user already pasted an error log earlier, or if the model just ran a test in the previous turn, those all enter the current round of context as part of messages.

Step 6: The model starts acting.

The model might first call Grep to search for the login test, then Read to inspect the test file, then Bash to run the specific test, and after getting the error back, use Edit to modify the source.

Every tool result is fed back into messages, and Claude Code reassembles the context for the next round.

This is why Claude Code appears to "understand your project." It does not understand it out of thin air. Every round, it puts project rules, runtime state, and tool observations back in front of the model.

11. What Problem Does This Mechanism Actually Solve?

The prompt assembly mechanism isn't about "writing longer prompts" — it addresses four engineering problems.

First, behavioral consistency.

System prompt tiering ensures the model maintains a clear identity across different modes. CLAUDE.md layering ensures that organizational rules, user preferences, and project conventions reliably enter the context.

Second, task relevance.

Dynamic context gives the model awareness of the current directory, Git status, date, tool set, and recent execution results — rather than rigidly applying generic rules to every project.

Third, continuity across long tasks.

Tool results are fed back into messages, and compressed summaries carry forward into subsequent rounds. The Agent doesn't suffer amnesia after each action.

Fourth, cost and performance.

Stable-segment caching, dynamic-segment isolation, and CacheSafeParams reuse prevent multi-turn calls and child Agent forks from re-processing the entire prefix every time.

These four together — that's the real Prompt Engineering behind Claude Code.

More precisely, this is no longer prompt engineering in the narrow, traditional sense. It is context engineering.

The distinction is worth remembering: Prompt Engineering asks "how do I phrase this for better results?" Context Engineering asks "what information does the model see, in what order, and how is it updated?" The latter is the core competency of production-grade Agents.

12. The Final Word in One Sentence

Claude Code's prompt is not some "magic incantation."

It's more like a workbench that gets reorganized every single turn:

System rules on the left,
project memory on the right,
the current task in the center,
tools laid out alongside,
compacted and tidied when the desk gets too cluttered,
then laid out fresh again when it's time to get back to work.

This dynamic assembly mechanism is the key piece that transforms Claude Code from an ordinary chat window into an engineering agent.

If the ReAct chapter was about how an agent acts turn by turn, this chapter is about:

Before each turn of action, how Claude Code rearranges the world the model needs to see.

Claude Code Source Analysis Series, Chapter 2: The ReAct Main Loop

LienJack — Sun, 10 May 2026 07:59:43 +0000

Chapter 2 of the Claude Code Source Analysis Series — The ReAct Main Loop

Claude Code is not just a model wrapper. It is a runtime where Model API handles reasoning, QueryEngine carries the session forward, Tools interface with the real engineering environment, and Context / State keeps multi-step work coherent across turns.

This chapter drills into the innermost control loop: how query.ts turns a single model call into an agent run that can keep gathering evidence, invoking tools, and advancing the task.

We'll use a simple debugging scenario throughout:

Take a look at why the tests are failing in this project and fix them.

A model on its own cannot natively read files, run commands, or maintain task state. So the question becomes:

How does Claude Code get the model to operate inside a controlled loop — reasoning, acting, absorbing results — until the task genuinely moves forward?

This is exactly what ReAct solves.

You do not need to memorize the acronym. Just keep this minimal feedback loop in mind:

Assess the current situation
Decide what to do next
Actually carry it out
Get the result
Reassess based on the new result

As a flowchart, it looks like this:

What query.ts does in Claude Code is engineer this feedback loop into a working system. The model reasons over the current context, then decides whether to act; the results of that action are written back into context, and the model proceeds to the next round of reasoning.

The part of the diagram that really matters isn't the fancy terminology — it's the straightforward state machine on the right side:

Build Query
-> Request Model API
-> Parse the response
-> Check if there are tool calls
-> If none, return the result
-> If yes, invoke the tools
-> Append tool results to messages
-> Check if compression is needed
-> Loop back to the next Query

This loop is what turns the architecture into a working runtime instead of a static component list.

But if you think of it as nothing more than a while loop, you're still missing a layer. A better framing is to see QueryEngine not as a "single-request handler" but as a "session-level task orchestrator." It doesn't just spin up for one incoming message and then disappear — it holds long-lived state across an entire conversation, threading together the model, tools, permissions, context, and compression.

So in this chapter we need to grasp both layers at once:

The query.ts layer: how each round of ReAct state transitions happens.
The QueryEngine layer: how state, tools, permissions, and resources are continuously orchestrated across an entire session.

The former explains "how the loop runs"; the latter explains "why this loop can persist stably across many rounds of a task."

1. Why the Main Loop Can't Just Call the Model API Once

Let's start with the simplest case.

A user asks:

Explain what useEffect does.

The program sends the question to the model, the model produces an answer, done.

But what if the user asks:

This React project won't start. Help me fix it.

The model almost certainly doesn't know the answer on the first round. It needs more facts at minimum:

What's the project structure?
What scripts are in package.json?
What error does the start command throw?
Where's the relevant source code?
After I make changes, do the tests pass?

These facts don't live in the model's parameters, and they're not in the user's one-liner. They exist in the real engineering environment: the file system, the shell, Git, test frameworks, logs, dependency configs.

So the agent needs an extra layer of mechanism:

The model assesses what it's missing
→ initiates a tool call
→ the program executes the tool
→ feeds the result back to the model
→ the model reassesses based on new facts

That's why the ReAct loop exists.

It's not about making the pipeline complex. Real tasks simply can't be resolved in a single response. It's more like a continuous correction process: form a hypothesis, gather evidence from the scene, then adjust the next move based on what you find.

Anyone who's debugged a production outage will recognize this pattern: start with a hypothesis, go gather evidence on the ground, then revise the next step based on what the evidence tells you.

2. ReAct Is Not the Model Acting on Its Own — It Is the Model Expressing Intent

This is an easy point to get wrong:

The model is not literally reading files, running commands, or editing code by itself.

What the model can produce is an intent to act. For example:

I need to read package.json.
I need to search for handleEnter.
I need to run npm test.
I need to edit a file.

The part that actually acts is Claude Code's host runtime: the outer layer made up of QueryEngine, the tools system, and the permissions system.

So the more accurate division of labor is:

The model decides what should happen next.
Claude Code decides whether it is allowed, how it is executed, and how the result is recorded.

That is exactly why Claude Code is much more than simply "connecting the model to a shell."

If you let the model emit raw shell commands and execute them directly, the system has no structured understanding of what the action means. Permissions, auditing, error recovery, and context write-back all become difficult to control.

The tools system turns an action into a structured event:

Tool: Read
Arguments: a file path
Permission mode: read-only
Result: file contents or an error
Write-back: appended to messages as a tool result

That way, the model still does the reasoning, but the action itself is placed inside a controlled engineering framework.

In one sentence: the model decides, the tools touch the real world, and QueryEngine organizes judgment and action into a sustainable loop.

3. The `query.ts` State Machine: The Core Is Not a Function — It's the `State`

The diagram on the left lists the State structure in query.ts. It highlights something important:

The Claude Code main loop doesn't chug along on scattered global variables — it revolves around a unified state object.

In simplified form, it looks like this:

interface State {
  messages: MessageParam[]
  toolUseContext: ToolUseContext
  turnCount: number
  shouldAutoCompact: boolean
  autoCompactTracking: {
    consecutiveFailures: number
    totalMessages: number
  }
  aborted: boolean
}

These few fields are the keys to understanding the ReAct loop.

`messages`: The Agent's Short-Term Working Memory

messages is not an ordinary chat log.

Inside the agent loop, it acts more like a running ledger:

What the user just said
What the model decided in the last turn
What tool calls the model initiated
What results the tools returned
What summary the system retained after compaction

The model does not automatically remember everything that has happened before. Every time Claude Code calls the Model API, it repackages the relevant history and includes it in the next model request.

So the point of messages is:

Turning multi-turn actions into context the model can see in the next turn.

Without messages, every model invocation would start from scratch, as if suffering from amnesia.

`toolUseContext`: What Tools Are Available This Turn

toolUseContext is the tool environment.

It's not just a list of tools — it tells the main loop:

Which tools are available right now?
What is the input schema for each tool?
What context does tool execution need?
How should results be converted into messages?
Which operations require permission checks?

The Act in ReAct is not an abstract action — it's a concrete action constrained by the tool system.

"Read a file" via the Read tool and "read a file" by running cat directly are two entirely different things in engineering terms. The former is traceable, constrainable, and can be written back into context as a structured tool result; the latter is just a string — and you may never know what it actually did when something goes wrong.

In other words, tools don't just need to run — they need to be traceable, constrainable, and structured so their results can flow cleanly back into the loop.

`turnCount`: This Is a Multi-Turn System, Not a Single Request

turnCount tracks how many iterations the loop has already completed.

The field itself looks mundane, but it exposes a fundamental design truth:

Claude Code was designed from the start with the assumption that tasks will span multiple turns.

It is not "ask the model once and hope it gets the answer right." It allows the model to gather information incrementally across turns, invoke tools, and course-correct its judgments.

turnCount also serves as a guard against infinite loops, enables logging statistics, and triggers degradation strategies. A mature agent must know how long it has been spinning, or it can easily get stuck circling a failure path.

So a mature agent must have turn counts, budgets, and exit conditions. Without these boundaries, a multi-turn loop easily turns into running in circles.

`shouldAutoCompact`: Context Swells — Compaction Must Be Part of the Main Loop

Once an agent starts invoking tools, messages grows rapidly.

Reading a large file, running a test, searching for a batch of results — all of these dump huge amounts of information back into message history. Short tasks are fine, but long tasks will slam into the context window very quickly.

So shouldAutoCompact is not a nice-to-have optimization — it is a mandatory capacity-governance signal for any long-running agent.

It answers:

Is the current message history too long?
Should older content be compressed into a summary?
Has compaction been failing consecutively?
How has the message volume changed before and after compaction?

Notice in the reference diagram why "check if compaction is needed" comes immediately after "append tool results to messages."

Because tool results are precisely what causes context to swell.

`aborted`: An Agent Must Also Be Safely Interruptible

Real engineering tasks don't always end gracefully.

A user might cancel, a command might get stuck, a tool might time out, a permission might be denied.

aborted signals that this loop can be interrupted externally. It's a reminder that an agent's main loop must account not only for "how to start" and "how to succeed," but also for "how to stop."

An agent that can't be safely stopped becomes more dangerous the more capable it gets.

The more capable an agent is, the more it needs the ability to be halted cleanly.

4. The QueryEngine Perspective: It Manages a Session, Not a Single Request

At this point, we've seen how one round of the ReAct state machine works inside query.ts. But reading the source code requires stepping one layer further out: who holds the long-lived state that this loop depends on?

The answer is QueryEngine.

One useful way to read the source is to treat QueryEngine at the conversation level. That framing matters because QueryEngine is not a one-shot request handler — it is a session object.

A single-request handler typically cares about:

What's the input?
What should I return?
Is this call finished?

A session-level orchestrator, however, cares about:

How do I keep appending to the message history?
Which permissions were previously denied?
Which files have already been read?
What's the current-round and cumulative usage?
Which skills have been discovered?
Which memory paths have been loaded?
Is the current task interrupted?

That's why QueryEngine surfaces a lot of cross-round state, for example:

type ConversationRuntimeState = {
  messages: Message[]
  abortController: AbortController
  permissionDenials: PermissionDenial[]
  totalUsage: Usage
  readFileCache: FileStateCache
  discoveredSkills: Set<string>
  loadedMemoryPaths: Set<string>
}

These fields show it's not a thin wrapper that "forwards the prompt to the model." It's maintaining the live context of a conversation.

The relationship between the two can be understood like this:

QueryEngine: session-level runtime, responsible for holding long-lived resources and state.
query.ts loop: task execution engine, responsible for building a Query round by round,
               calling the model, running tools, and appending messages.

State is more like a working snapshot of a single loop iteration; QueryEngine is more like the scheduling center behind the session.

With this perspective in place, ReAct is no longer just a small loop of "should the model keep calling tools." It's part of a complete task lifecycle.

5. `submitMessage()`: The Real Entry Point That Starts an Agent Run

Following the trail from a user action, whenever a user submits a message, the real entry point typically lands on a method like submitMessage().

Unlike a typical backend endpoint that receives nothing more than a prompt, this method reads and prepares an entire set of runtime resources at once:

Current cwd
Available tools
Slash commands
MCP clients
Thinking configuration (whether extended reasoning mode is enabled)
Max turns
Budget limits
Session persistence state

So submitMessage() is not fundamentally "fire off a chat request." It is:

Launch an agent run.

Over the course of that run, it has to handle roughly the following:

Read current config and session state
Set up the working directory and session environment
Wrap the tool permission determination logic
Prepare the system prompt and context
Invoke the underlying query loop
Handle tool calls as the model produces output
Write tool results back into session history
Track usage, cost, and boundary conditions

The ReAct loop inside query.ts is just the inner kernel of "how the task moves forward"; submitMessage() and QueryEngine are what put that kernel into a real Claude Code session and actually run it.

This is also where Claude Code is more engineered than a minimal agent demo. A demo usually only proves that "the model can call tools," but QueryEngine has to guarantee:

Is this tool call actually allowed?
Can the result feed back into the next round of model input?
Can it recover on failure?
Will state get corrupted across a long-running session?
Will context or budget spiral out of control?

Real agent engineering lives in these places that don't look flashy.

6. Translating the Right-Hand Flow into Code: What Actually Happens Inside the While Loop?

The right side of the diagram translates into a simplified pseudocode snippet:

while (!state.aborted) {
  const query = buildQuery(state)
  const response = await requestModelAPI(query)
  const parsed = parseModelResponse(response)

  if (!parsed.hasToolUse) {
    return parsed.finalAnswer
  }

  const toolResults = await runTools(
    parsed.toolUses,
    state.toolUseContext,
  )

  state = appendToolResultsToMessages(state, response, toolResults)
  state = maybeAutoCompact(state)
  state = nextTurn(state)
}

There are three key points in this pseudocode.

First, buildQuery(state) is not simply concatenating the user's question. It constructs the model input for the current turn based on the current State, including message history, system prompt, available tools, context summaries, and so on.

Second, the result returned by requestModelAPI(query) is not necessarily the final answer. It could be text, or it could contain a tool invocation request.

Third, the loop only ends when the model no longer requests tools. As long as the model still needs tools, Claude Code will keep executing them, feeding results back, and advancing to the next turn.

So while(true) isn't a mindless infinite loop.

The real exit conditions are:

The model no longer requests tools
or the task is interrupted
or an engineering limit, error, or permission block is triggered

This is the heartbeat of the agent loop.

(When reading the source, set breakpoints on these three functions: buildQuery, parseModelResponse, maybeAutoCompact. They map to three core questions: how input is assembled, how output is interpreted, and how state is governed. If those three are clear, the rest of the file becomes much easier to follow.)

7. "Has Tool Calls?" — the critical fork in the whole machine

Refer back to the diamond in the diagram:

Has tool calls?

Simple as it looks, this step defines the semantics of the current turn.

No tool calls means the model considers the available information sufficient and can deliver a final answer:

no -> break -> return result

Has tool calls means the model believes the information is still incomplete and it needs to go gather evidence from the outside world:

yes -> invoke tools -> write back to messages -> another round

Tool calls are not an add-on feature. They are the switch that moves Claude Code from answer mode into action mode.

A conventional chatbot usually stops at the first case: generate text and done.

An agent, on the other hand, must support the second case: the model admits it doesn't yet know and fills in the gaps through tools.

This is also the core of ReAct:

Reason: the model judges the next step based on current context
Act:    the model issues a tool-call intent
Observe: the tool result is written back into messages
Reason: the model continues judging based on the new observation

Round after round, it spins through this cycle — and only then does the system come across as something that "gets things done."

8. Why Must Tool Results Be Appended to Messages?

After a tool executes, the most critical step isn't "getting the result" — it's:

Writing the result back into the message stream.

For example, the model requests reading package.json. The tool does read the file contents, but if that result isn't appended to messages, the model in the next turn has no way to see it.

This creates a bizarre disconnect:

Model: "I need to read package.json"
System reads package.json
Model (next turn): "I still don't know what's in package.json"

Appending tool results to messages is fundamentally about completing the Observation step in ReAct.

It translates a fact from the external world back into context the model can consume.

Another way to think about it:

Tool calls let the model touch the real world.
Appending to messages lets the model remember what it just touched.

Without the former, the model can only guess.

Without the latter, the model suffers amnesia after every action.

Plenty of minimal agent demos appear to invoke tools, yet they fail on longer tasks for exactly this reason: they have Act, but they do not have a reliable Observe -> write-back -> next-round Reason loop.

9. Why Is Compaction Placed After Tool Write-Back?

The final step in the reference diagram is:

Check if compression is needed

And it sits right after "Append tool results to messages."

This ordering matters a great deal.

Tool results are often the primary source of context bloat:

Read a file     → may return hundreds of lines of code
Run a test      → may return a long log dump
Search code     → may return dozens of hit locations
Call an external service → may return a large structured JSON

If every one of these results gets fed verbatim into the next round of model input, long tasks quickly become expensive, slow, and prone to losing focus.

So Claude Code has to keep asking one question inside the main loop:

Can the current messages still be carried forward as-is?

If not, compression kicks in.

Compression doesn't mean casually discarding content — it means preserving the information that remains useful for downstream tasks:

What is the user's goal?
What has already been tried?
Which files have been read?
Which commands have been run?
Which errors are still unresolved?
What should the next step focus on?

Auto-compression is not a "token-saving trick" — it is the infrastructure that makes long-running Agents possible in the first place.

Without compression, the harder the ReAct loop works, the more the message history spirals out of control.

The compaction strategy is a strong signal of agent-engineering maturity: crude truncation risks dropping critical information, while over-compression can make the model "forget" what it has already done. We'll unpack this in more detail when we get to context management.

10. From a Source-Reading Perspective, How Do You Trace This Main Thread?

Read query.ts, but don't jump straight into the branches.

A better approach is to first get a handle on these 8 questions:

1. Where is the QueryEngine created?
2. How does submitMessage kick off an agent run?
3. Where is State created?
4. What does buildQuery pull from State?
5. After the Model API returns, how does the code detect tool use?
6. Where are tool calls actually executed?
7. How are tool results appended back into messages?
8. When does compaction trigger?

Once these 8 questions connect, the main relationships between query.ts and QueryEngine become clear.

If you want to go deeper, you can pin these questions to a few more concrete source anchors:

QueryEngine.ts
-> Find submitMessage: how user input enters a turn

query.ts
-> Find QueryParams: what inputs a query round needs
-> Find State: what state is preserved between loops
-> Find queryLoop: where messagesForQuery is assembled each round
-> Find tool_use collection: how model output becomes a list of tool calls
-> Find the tool execution entry: how runTools / StreamingToolExecutor is chosen
-> Find tool_result write-back: how tool results merge into the next round's messages

services/tools/StreamingToolExecutor.ts
-> See how streaming tool execution and concurrency safety work together

services/tools/toolOrchestration.ts
-> See how batched tool calls are grouped by isConcurrencySafe

Behind these anchors lies the same engineering pipeline:

messagesForQuery
-> model stream
-> assistantMessages + toolUseBlocks
-> toolResults
-> next State.messages

The many branches in the source still trace back to this same pipeline. Prompt-too-long recovery, max-output-tokens recovery, stop-hook blocking, compaction, memory prefetch, and skill discovery are all, at bottom, answering the same question: if this round does not complete cleanly, how should the next round's State be constructed?

You'll find that what this file really wants to convey isn't "some particular function is incredibly complex," but rather a very stable engineering pattern:

State
-> Query
-> Model Response
-> Tool Use?
-> Tool Result
-> Updated State
-> Next Query

Once you understand this pipeline, everything else — Tools, Context, Prompt, Memory, Permission — can be mapped back onto it.

Tools are the action layer.

Context is the material organization for each round's Query.

Prompt is the set of rules telling the model how to decide and act.

Permission is the brake that sits before every action.

Compact is capacity governance for long-running tasks.

And query.ts's ReAct state machine is the backbone that threads all of these capabilities together.

11. Redraw the Reference Diagram as a Mermaid Flow

You can compress the entire diagram into the following flow:

The two loops in this diagram are the most important thing to take away.

The first is the ReAct loop:

Reason -> Act -> Observe -> Reason

The second is the engineering state loop:

QueryEngine -> State -> Query -> Response -> Tool Result -> State -> QueryEngine

The former explains why an agent seems to think while doing.

The latter explains why the source code must include QueryEngine, State, messages, toolUseContext, turnCount, autoCompactTracking, permissionDenials, and totalUsage.

12. One-Sentence Summary

The ReAct mechanism in query.ts is, at its core, maintaining a continuously evolving State.

In each cycle, Claude Code builds a Query from the current State, calls the model API, and parses whether the model wants to invoke a tool. If the model no longer needs a tool, it returns the final result. If the model needs a tool, the system executes it, appends the result to messages, checks whether compression is necessary, and then enters the next cycle with the updated State.

Surrounding this loop, QueryEngine holds session-level state and organizes tools, permissions, context, budget, caching, and interrupt control into a complete task runtime.

So Claude Code is not a "model answers once" program. It is a state-driven agent runtime:

The model decides the next step.
Tools touch the real world.
messages bring the real world back to the model.
Compression keeps long tasks running.
State organizes all of this into a sustainable loop.
QueryEngine places this loop inside a session-level runtime.

Once you understand this ReAct loop, Prompts, Tools, Context management, and multi-agent collaboration stop looking like scattered modules.

They all serve the same purpose:

Enabling the model not just to speak, but to get things done step by step in the engineering world.

Claude Code Source Analysis Series, Chapter 1: Architecture

LienJack — Sun, 10 May 2026 05:53:11 +0000

Claude Code Source Analysis Series, Chapter 1: Engineering Architecture

When most people first encounter Claude Code, they mentally file it as "a chat box that can write code."

That's not wrong, but it misses the point. What makes Claude Code truly powerful isn't just that the model can answer coding questions. Wrapped around that model is an entire engineering system: it reads your project, invokes tools, maintains context, manages state, connects to MCP, dispatches sub-agents, and enforces permission and security boundaries.

So rather than diving straight into a specific function in the source, this chapter starts by answering a bigger question:

What kind of engineering architecture is Claude Code, exactly?

Here it is in one sentence:

Claude Code = Model API + QueryEngine main loop + Tools system + Context/State management + Security governance + Agent collaboration.

The model provides the core reasoning capability. What turns it into a "programming agent that gets things done" is the entire runtime layer wrapped around it.

To make this concrete, we'll approach it through three questions:

Functional architecture: What capability layers does it have?
Runtime architecture: How does a user's prompt flow through the system?
Code architecture: How is the source roughly organized by module?

These three questions build on each other: first understand what capabilities Claude Code has, then see how the QueryEngine orchestrates them, and finally map them back to the modules in the source code.

1. Why You Can't Just Hook Up a Model API

Suppose you build the simplest possible AI coding assistant. The flow would look something like:

User asks a question
-> Backend forwards the question to the LLM
-> LLM returns an answer
-> Display the answer to the user

That's barely adequate for "explain this piece of code." But the moment the user says:

Look at this project and figure out why the tests are failing, then fix them.

Things get complicated fast.

The model needs to understand the project structure, know what files exist, know how to run the test command, know where the error logs live, and know which file to change. After making changes, it needs to re-run the tests to verify. If it hits a permission error, a failed command, an overflowing context window, or an oversized tool output along the way, it needs to recover.

Models can think — but they can't touch a real engineering environment on their own.

A model doesn't natively read files. It doesn't natively execute shell commands. It doesn't natively maintain long-running task state. And it doesn't natively know which operations are dangerous. So Claude Code has to wrap a layer around the Model API — an "engineering shell."

That engineering shell is the core value of Claude Code.

This is exactly where many open-source agent projects stall: the model-calling layer looks great, but the engineering shell leaks the moment anything pushes back.

2. Functional Architecture: What Capability Layers Does Claude Code Have?

Viewed through its functional architecture, Claude Code resembles a layered Agent Runtime — an agent execution environment built around the model, responsible for dispatching tools, managing state, and advancing tasks.

At the innermost layer sits the Model API. This is the reasoning core, responsible for understanding tasks, generating responses, and deciding whether the next step requires calling a tool. But it is only the "brain," not the complete system.

Wrapped around the model is the first runtime layer: the QueryEngine — the query engine that turns a single user input into a continuously running agent main loop. Without the QueryEngine, Claude Code would be nothing more than a plain API wrapper. With it, it becomes a runtime capable of driving tasks forward on its own.

The next layer outward is the Tools system. This layer gives the model "hands and feet": file reads and writes, shell commands, search, web access, MCP, LSP, Agent tools, and Skills all belong here.

Beyond that lies Context / Memory / State. This layer answers the question: "What exactly should the model know for this turn?" It dynamically assembles the system prompt, user input, project rules, message history, tool results, file caches, compression summaries, and current application state.

Farther out still is Agent Collaboration. When tasks become complex, Claude Code does more than converse with the model in a single thread — it can decompose subtasks to subordinate Agents or Tasks. The main Agent handles the overall judgment; child Agents handle code search, solution exploration, or hypothesis validation.

At the outermost layer, underpinning every capability, is security governance. Because Claude Code operates in real engineering environments, it can read proprietary code, execute commands, modify files, and invoke external services. Without permissions, policies, sandboxing, prompt injection defenses, and audit logging, a more powerful agent only means greater risk.

The following diagram captures this:

The core message this diagram aims to convey is not "Claude Code has many modules," but rather:

Claude Code's capabilities do not grow directly out of the model. They grow out of the engineering systems layered one by one around the model.

Model API: Responsible for Judgment, Not Execution

Let's clarify the most easily confused point first: the Model API does not directly execute any tools.

What the model actually produces is intent, roughly like this:

I need to read a certain file.
I need to search for a certain keyword.
I need to run a test command.
I need to modify a certain piece of code.

But the actions — "read the file," "execute the command," "modify the code" — are all carried out by the Claude Code host program.

The division of labor is clear:

The model handles understanding, planning, and choosing.
The program handles execution, constraints, and recording.

If you imagine every capability as the model's own magic, you'll miss where Claude Code's real value lies. The parts actually worth studying are precisely those that are "not intelligent but deeply engineered": tool contracts, permission systems, state management, context compression, error recovery, UI rendering, and session recording.

QueryEngine: The Heartbeat of the Entire System

The QueryEngine is Claude Code's main loop.

Its role is not simply "sending requests to the model." It manages an entire session lifecycle. A session contains multiple rounds of user input, multiple rounds of model responses, multiple tool calls, and multiple state changes. The QueryEngine must string all of these together.

The state it maintains includes at minimum:

Current message history
Current working directory
Currently available tool set
Current model and budget
File read cache
Permission denial records
Skill discovery records
Token usage count
Session transcript

Together, this state determines what Claude Code should do next.

(The implementation details of the QueryEngine will be covered in the next article, but it is fundamentally a state machine: each turn decides what to do based on the current state, executes it, then updates the state.)

Tools System: The Model's Hands and Feet — But Always Under Control

Claude Code's tool system can be understood as a unified capability marketplace.

It includes basic tools:

Read / Write / Edit / Grep / Glob / Bash

As well as extended capabilities:

MCP / LSP / Web / Agent / Skill

The most important thing about the tool system is not "many tools," but that they are all placed inside a unified tool contract. Every tool must answer a set of questions:

What is this tool called?
What are its input parameters?
How are inputs validated?
How is it executed?
How is output converted back into a message?
Is it read-only?
Is it destructive?
Is concurrent execution allowed?
Does it require user confirmation?

This is what makes Claude Code more engineered than "letting the model write shell commands on its own."

Take viewing a file, for example. Letting the model directly run:

cat src/main.ts

would certainly work, but the system would have no way of knowing the real semantics of that action. It's just a shell string.

By going through the Read tool instead, Claude Code can know:

This is a read operation.
What is the target path?
Is access authorized?
Is the output too large?
Should it be truncated?
Should it enter the file cache?
Will a subsequent Edit be based on the latest version?

This is the value of tool abstraction:

Tools exist not to let the model "do more," but to make the model's actions understandable, constrainable, and auditable.

Context / Memory / State: Giving the Model What It Needs to Know

Another easily underestimated capability of Claude Code is context engineering.

Many people hear "context" and assume it means "writing longer prompts." But in Claude Code, context is not a static block of text — it is a runtime input dynamically assembled anew for every turn.

It may include:

The base system prompt
The current user input
Conversation history
Project-level rules
User-level rules
The current working directory
Available tool descriptions
External capabilities exposed via MCP / LSP
Skill instructions
File read results
The result of the last tool execution
A compressed summary of history
The current AppState

Two problems here are genuinely hard.

The first is "what to give." Give too little, and the model lacks context; give too much, and the context explodes, sending cost and latency out of control.

The second is "when to give it." Some information should enter the system prompt right from the start; some should be fetched via tools only when the model actually needs it; some tool schemas can also be discovered lazily rather than crammed in all at once.

Put bluntly:

Context Engineering is not prompt writing — it is context scheduling.

Memory / Compression addresses the long-task problem. Real engineering tasks routinely cycle through searching code, reading files, running tests, analyzing errors, modifying code, and running tests again. Every step produces messages and tool results. If you stuff all of them back into the model verbatim, the context quickly becomes long and noisy.

The value of the compression mechanism is not simply saving tokens — it is keeping the Agent on track through long-running tasks.

AppStateStore, meanwhile, unifies CLI UI state, session state, tool state, and Agent state. Is the current session in Plan Mode? What MCP tools are currently available? What is the current permission mode? Is there a background task running right now? None of these can be resolved by model messages alone — they require an application state system.

MCP / LSP / Skills: The Capability Integration Layer

Claude Code cannot bake every capability directly into the main program, so it needs an extension mechanism.

MCP (Model Context Protocol — a protocol that lets external tools and resources be called by the Agent in a standardized way) functions more like an external tool protocol. It lets Claude Code discover and invoke tools and resources provided by external services. Databases, browsers, design tools, internal systems — all can become Agent-callable capabilities through MCP.

LSP (Language Server Protocol — provides code-semantic capabilities like symbols, definitions, and references) leans more toward code intelligence. It helps Claude Code better understand the programming language itself.

Skills are closer to reusable task-method bundles. They are typically not a single API but a set of instructions, scripts, templates, and trigger rules that tell the Agent how to handle a certain class of task.

These three solve different problems:

MCP: How to standardize integration of external capabilities.
LSP: How to integrate code-semantic capabilities.
Skills: How to load reusable working methods.

Together they form Claude Code's extension layer.

(A practical pitfall during integration: if an MCP tool's schema is too large, it will directly blow up the context. Claude Code's approach is lazy discovery — not stuffing everything in all at once.)

Security Governance: The More Capable the Agent, the More It Needs Boundaries

The security layer is not decoration — it is the prerequisite for Claude Code to exist as an engineering tool at all.

The security layer must address roughly four categories of problems:

Permissions & Policy: Which tools are available, which paths are accessible, which commands require confirmation.
Sandboxing: Confining dangerous actions to a controlled environment.
Prompt Injection Prevention: Preventing project files or external content from inducing the model to act without authorization.
Audit Logs: Recording what the model did, what tools executed, and what the user approved.

The most critical design principle here is:

The model can suggest actions, but it cannot bypass system boundaries to act directly.

When the model outputs a tool_use (the behavior where the model requests to invoke a tool via a specific format), it is only initiating a request. Before actual execution, it must still pass through the tool system and the permission system.

This is also the watershed between Claude Code and many toy Agents: toy Agents pursue "getting it to run"; production-grade Agents must pursue "getting it to run within constraints."

3. Runtime Architecture: How Does a Single User Sentence Flow Through the System?

After understanding the functional layer, let's look at the runtime architecture.

From the user's perspective, the process boils down to one sentence:

User: Help me fix this bug

But inside Claude Code, that sentence isn't sent directly to the model. It first enters a runtime orchestrated by the QueryEngine.

A simplified runtime flow:

User input
-> Claude Code session
-> QueryEngine.submitMessage()
-> Process user input and slash commands
-> Build context and system prompt
-> Call Model API
-> Model returns text or tool_use
-> Tool system checks permissions and executes tools
-> Tool results written back to message history
-> QueryEngine continues to the next round
-> Until the task is complete or user decision is needed

Drawn as a sequence diagram:

There are two feedback loops in this diagram.

The first loop is the cycle between the model and tools:

Model determines the next step
-> Tool executes a real action
-> Tool result goes back to the model
-> Model continues reasoning

This is what enables Claude Code to push tasks forward continuously.

The second loop is the cycle between context and state:

Each round changes the message history, tool results, permission state, task state
-> The next round, QueryEngine reassembles context based on those changes

This is why Claude Code isn't just "one question, one answer." It behaves more like a continuously running state machine.

Slash Commands Don't Always Hit the Model

There's another detail in the runtime architecture: not all user input triggers a Model API call.

Some inputs are local commands — configuration, cleanup, compression, status viewing. If these were forced through the model, they'd waste tokens and introduce instability.

So when QueryEngine processes user input, it first determines:

Is this a task that requires model reasoning?
Or is this a command that can be executed locally?

If it's a local command, the system returns the result directly and ends the round early.

(This is a pragmatic design choice. If typing /clear to clear the screen still required a round trip to the model, the experience would be terrible.)

Plan Mode: Slowing Down Execution First

The runtime architecture also includes an important mode: Plan Mode.

For a typical chat product, the model can just respond directly. But for a coding agent, "acting immediately" carries risk — it might modify files, run commands, and affect the project state.

The point of Plan Mode is to split a task into two phases:

First: understand and plan
Then: execute and modify

What this reflects is Claude Code's design philosophy around control:

Not every task should be executed immediately.
Not every tool should be open by default.
The user should be able to see the plan at key decision points and decide whether to proceed.

A mature agent system doesn't blindly pursue "maximum automation." The real challenge is finding the balance between automation and controllability.

4. Code Architecture: How Is the Source Code Organized?

Let's return to code architecture at the end.

If functional architecture answers "what capabilities does Claude Code have," and runtime architecture answers "how do those capabilities run together," then code architecture answers:

When you open the source code, what mental map should you build first?

Here's a trick for reading the source: don't scan the directory tree horizontally. Claude Code has many source directories — components, services, tools, hooks, utils — and it's easy to get lost. A more reliable approach is to first identify the load-bearing chain:

Entry point hands user input to a session
→ QueryEngine manages a conversation
→ query.ts drives rounds of ReAct
→ The Tool protocol turns model intent into executable requests
→ Context / Prompt determine what the model sees each round
→ Permission / Hooks / State determine whether an action can land

In other words, this section isn't listing directories — it's locating the source-code coordinates for the articles that follow.

Start by building an overall mental picture with this diagram:

This code architecture diagram can be read in layers.

Entry Layer: cli.tsx and main.tsx

At the top are cli.tsx and main.tsx.

cli.tsx is the actual binary entry point. It handles fast-path options like --version that don't require loading the full application. The goal is to make the CLI tool start as quickly as possible.

main.tsx enters the full startup flow, responsible for command-line arguments, configuration, environment variables, preloading, and mode dispatch. It routes the program into different runtime paths:

Interactive REPL
Headless / SDK
MCP service
Other command paths

Claude Code isn't just the "terminal chat" form factor. The REPL, SDK, and MCP service can all share the same underlying capabilities.

Interaction Layer: REPL and Terminal UI

Claude Code is a CLI product. The terminal UI is not an afterthought.

It has to handle:

User input
Streaming output
Tool execution progress
Permission confirmations
Error prompts
Task status display
Plan Mode interaction

This is also why the source code contains a substantial amount of UI rendering and state subscription logic. The agent doesn't just run in the background — the user needs to understand what it's doing, right in the terminal.

Orchestration Layer: QueryEngine and the query Main Loop

QueryEngine is the session-level orchestration layer.

It connects upward to the REPL / SDK and downward to the query main loop, context system, state system, tool system, and command system.

The query main loop is more focused on model invocation and the ReAct Loop — an action-loop pattern where the model alternates between reasoning and executing actions. It's responsible for sending messages to the model, receiving model responses, identifying tool_use, and placing tool execution results back into the message stream.

A simple distinction:

QueryEngine: manages an entire session.
query main loop: manages one or more model-tool cycles.

Capability Layer: Tools / Commands / Services

The capability layer breaks down into three categories.

The first is Commands. It handles slash commands and local commands. Some user inputs don't require model reasoning — executing them locally is more reliable.

The second is Tools. It handles the tool capabilities the model can invoke: reading files, editing files, running shell commands, searching code, calling sub-agents.

The third is Services. It carries external integrations and extension capabilities: MCP, LSP, plugins, remote sessions.

These three categories together form Claude Code's execution layer.

Context Layer: context, memory, compression

The context layer answers one question:

What exactly should be sent to the model this round?

It's not just concatenating strings. It synthesizes the current task, conversation history, project rules, user rules, tool descriptions, MCP capabilities, skill descriptions, file read results, and compressed summaries.

This is also why Claude Code seems to "understand your project": the model doesn't innately understand it — the context layer continuously assembles project-relevant information and feeds it to the model.

State Layer: AppStateStore

AppStateStore is responsible for global state.

It manages more than just UI state; it also covers:

Current model configuration
Tool permission context
MCP clients and tools
Plugin state
Sub-agent / Task state
Remote session state
User settings

Without the state layer, it would be difficult for Claude Code to unify the "terminal application" and the "agent runtime."

Security Layer: permissions, sandbox, audit

The security layer doesn't map to a single file in the code. Instead, it runs through tool execution, permission decisions, command classification, MCP calls, user confirmations, and session recording.

Its essence is turning the model's free-form intent into governed execution requests.

The model says "here's what I want to do"
The system decides "are you allowed to do it"
The tool enforces "do it according to the rules"
The log records "here's what you did"

This is the difference between a production-grade agent and a demo agent.

Load-Bearing Files in the Source: Read a Few Beams First

When it comes to reading source code, don't start by surveying every directory. Start with a handful of load-bearing files.

QueryEngine.ts is the session layer. Its job isn't that it "does everything directly" — it's that it holds everything one conversation needs to persist across turns: message history, permission denial records, file read caches, model configuration, tool sets, the MCP client, Agent definitions, and the AppState read/write entry point. Each call to submitMessage() is just a new turn within the same conversation.

query.ts is the loop layer. It maintains a per-iteration State, carrying messages, toolUseContext, autoCompactTracking, turnCount, pendingToolUseSummary, and other state into the next round. As the model streams its response, query.ts collects any tool_use blocks in the assistant message. If there are no tool calls, it wraps up. If there are, it executes them, appends the results back to the message list, and continues the loop.

Tool.ts is the action protocol layer. A tool is not a function — it is a protocol: input schema, invocation mode, whether it's read-only, concurrency-safe, destructive, requires permission, its result size, how it renders in the UI, how errors get backfilled, and more — all declared up front. The model doesn't output "I'll just do whatever." It outputs "I want to initiate a request under this tool protocol."

tools.ts and services/tools/toolExecution.ts are the tool menu and execution lifecycle. The former determines which tools the model can see in the current turn; the latter governs how a single tool call goes through schema validation, tool-level input validation, permission checks, hooks, actual execution, and result serialization.

context.ts, constants/prompts.ts, and services/compact are the model's workbench. They determine how system rules, project memory, Git state, tool descriptions, message history, tool result budgets, and compaction summaries are assembled into each model request.

So source reading can be compressed into one line:

QueryEngine manages sessions, query.ts manages the loop, Tool defines action boundaries, Context/Prompt assemble the model's workbench.

Once this backbone is clear in your mind, MCP, Skills, Agents, and Plans won't feel like scattered feature islands. They're all extensions that plug into this main highway.

5. The Three-Layer Architecture as a Whole

Now let's bring everything together into a unified understanding.

The functional architecture tells us that Claude Code is not a model—it is a capability system built around a model:

Model API
-> QueryEngine
-> Tools / Context / Memory / State
-> MCP / LSP / Skills / Agent Collaboration
-> Security & Governance

The runtime architecture tells us that a user's prompt does not go directly to the model—it enters a continuously running Agent Runtime:

User Input
-> QueryEngine assembles context
-> Model API makes a decision
-> Tools execute
-> Results flow back
-> QueryEngine proceeds to the next round

The code architecture tells us how to find entry points when reading the source, guided by these layers:

Entry layer:       cli.tsx / main.tsx
Interaction layer: REPL / Terminal UI
Orchestration layer: QueryEngine / query
Capability layer:  Tools / Commands / Services
Context layer:     context / memory / compression
State layer:       AppStateStore
Security layer:    permissions / sandbox / audit

So the essence of Claude Code is not "a model plus a handful of tools." It is:

An extensible, governable, continuously running Agent Harness built around a model.

"Harness" is an apt metaphor here: the model supplies intelligence; the Harness supplies the operating environment. Without the model, the system has no reasoning ability; without the Harness, the model has no stable ability to get things done.

6. The Main Thread in a Nutshell

If all you want is the backbone of Claude Code's engineering architecture, hold on to these few lines:

Claude Code is not a chat box — it is a CLI-based Agent Runtime.
The Model API is the reasoning core, but it does not directly execute real-world actions.
The QueryEngine is the main loop that strings together user input, model responses, tool calls, and state transitions.
The Tools system is the execution layer, but every tool must pass through contracts, permissions, and result serialization.
Context Engineering is the dynamic assembly of context — it is not simply writing a long prompt.
The AppStateStore enables the CLI UI, session state, tool state, and agent state to work in concert.
MCP, LSP, and Skills form the extension layer, so Claude Code does not have to hard-code every capability internally.
The security layer determines whether an agent can graduate from demo to real-world engineering environments.

The next piece dives deeper: how exactly the QueryEngine implements this dialogue main loop, and how it threads model calls, tool execution, and context compression together into a resumable state machine.

DEV Community: LienJack

Claude Code Source Analysis Series, Chapter 5: Tools Overview

Chapter 6 of the Claude Code Source Analysis Series | Tools Overview

1. Tool.ts Solves the Problem of "Actions Must Become Protocols First"

2. inputSchema Turns Model Output from "Natural Language" into "Structured Intent"

3. ToolUseContext — Tools Are Not Isolated Functions

4. tools.ts Is a Tool Registry, Not the Final Menu

5. Why Tools Are Filtered Before They Reach the Model

6. ToolPermissionContext — The Permission Backpack

7. Tool Execution Is Not Just "Calling a Function" — It's a Lifecycle

8. Why Tools Are Categorized as Read-Only, Destructive, and Concurrency-Safe

9. Built-in Tools Fall into Five Categories — Not a Pile of Names

10. Why MCP, LSP, and Skill Can All Plug Into the Same System

11. The Tool System Is Where Claude Code's Engineering Philosophy Really Shows

12. The Whole Chapter in One Diagram

13. Which Tool Chain to Follow When Reading the Source

14. Summary

Context Governance for Coding Agents

Context Governance for Coding Agents

1. Why Context Management Becomes an Engineering Problem

1. Token Explosion

2. Context Pollution

3. Constraint Loss

4. Compression Amnesia

5. Multi-Agent Pollution

2. Context Is Not Prompt, and It Is Not Memory Either

3. Separate the Action Layer from the Architecture Layer

4. The Seven-Dimension Model: Turn Context into a Governable Working Set

1. Visibility: Decide First Whether the Model Should See It

2. Authority: Conflicts Need a Resolution Chain

3. Temperature: Information Needs Hot and Cold Layers

4. Shape: The Same Information Can Take Different Forms

5. Retrieval: Recall Is Not Just Vector Search

6. Compression: Shrink the Working Set, Not the Truth

7. Boundary: Isolation Is the Main Thread's Self-Preservation Mechanism

5. How Context Grows While an Agent Executes a Task

6. Engineering Problems You Will Actually Hit, and How to Solve Them

Problem 1: The Model Doesn't Know the Workspace

Problem 2: Tool Results Are Too Large

Problem 3: Stale Information Pollutes New Decisions

Problem 4: Rules Conflict with Each Other

Problem 5: The Task Loses the Thread After Compression

Problem 6: Multiple Agents Contaminate Each Other

Problem 7: Cost and Latency Spiral Out of Control

7. How Different Projects Handle Context

1. Claude Code: Context Defense Lines for a Long-Task CLI Agent

2. LangGraph: Move Context Out of Chat History and into Structured State

3. OpenAI Agents SDK: Separate Local Context from LLM Context

4. AutoGen: Model Context and Memory Injection in Multi-Agent Systems

5. Cursor / Copilot: IDE Assistants Prioritize Local Relevance and Low Latency

6. Hermes / OpenClaw / Enterprise Harnesses: Long-Running Runtime and Governance Context

8. Put Them Side by Side

9. Building a Minimal Context Manager Yourself

10. How to Write a Good Compression Summary

11. What Questions Should Drive Your Architecture Choice?

12. One-Sentence Summary

Claude Code Source Analysis Series, Chapter 4: Context Management

Chapter 4 of the Claude Code Source Analysis Series | Context Management

1. Context Is Not Just Text History. It Is a Workbench Rebuilt Every Turn

2. Why Does a Coding Agent's Token Usage Spike So Quickly?

1. Token Explosion

2. Context Pollution

3. Compression Amnesia

3. Put Context Back Inside the QueryEngine Main Loop

4. Governance Comes Before Compression

5. Claude Code's Compression Is a Layered Defense, Not a Blunt Instrument

1. Tool Result Budget: Cap the loudest noise source first

2. Snip: Remove low-value bulk without breaking the structure

3. MicroCompact: Clean stale tool results without destroying the task's structure

4. Context Collapse: Fold the view before you rush to summarize

5. AutoCompact: At the end of the line, turn history into a handoff note

6. Reactive Compact: The recovery path after the model says it is full

6. Why Keep the Recent Tail After Compression?

7. Do Not Confuse Context, Memory, and Transcript

8. What Claude Code Achieves Through the Seven-Dimension Lens

9. Which Objects Matter Most When You Read the Source?

10. If You Were Building a Minimal Context Manager Yourself, Where Would You Start?

11. One-Sentence Summary

Claude Code Source Analysis Series, Chapter 3: Prompt Construction

Chapter 3 of the Claude Code Source Analysis Series — Prompt Construction

1. `Tool.ts` Solves the Problem of "Actions Must Become Protocols First"

2. `inputSchema` Turns Model Output from "Natural Language" into "Structured Intent"

3. `ToolUseContext` — Tools Are Not Isolated Functions

4. `tools.ts` Is a Tool Registry, Not the Final Menu

6. `ToolPermissionContext` — The Permission Backpack

4. Second Layer: `CLAUDE.md` Is Project Memory, Not an Ordinary README

Why does `CLAUDE.md` end up in the prompt?

3. The `query.ts` State Machine: The Core Is Not a Function — It's the `State`

`messages`: The Agent's Short-Term Working Memory

`toolUseContext`: What Tools Are Available This Turn

`turnCount`: This Is a Multi-Turn System, Not a Single Request

`shouldAutoCompact`: Context Swells — Compaction Must Be Part of the Main Loop

`aborted`: An Agent Must Also Be Safely Interruptible

5. `submitMessage()`: The Real Entry Point That Starts an Agent Run