Long-running agent sessions eventually hit the same problem: the model keeps accumulating chat history, tool outputs, intermediate decisions, and execution traces until the prompt becomes expensive or unstable. AIClaw has a built-in answer for that problem. It does not simply drop old messages. It compresses the middle of the conversation into a structured summary and keeps the parts that still matter for the next step.
This is not a new release post. It is a deeper look at one existing AIClaw runtime feature: context compression.
The Problem
AIClaw is designed for tool-using work, not short chatbot replies. A single task can include:
- multiple rounds of shell or browser tool calls
- long tool outputs
- plan-state progress updates
- follow-up fixes after the first attempt
- sub-agent results flowing back into the parent run
That is useful context, but it also means the prompt grows fast. If the runtime sends everything back to the model forever, cost increases and the model starts paying attention to the wrong parts of the history.
The README describes this capability briefly as:
Runtime compression: Long middle context can be summarized during execution.
The implementation behind that line is more specific than it sounds.
When AIClaw Decides To Compress
The decision lives in internal/agent/context_compressor.go and is wired into the main execution loop in internal/agent/run.go.
Before each LLM round, AIClaw checks whether the current prompt is too large relative to the model context window.
The current defaults are straightforward:
- compress when prompt usage reaches 50% of the model context window
- keep the system message at the head
- keep at least the latest 20 messages at the tail
- require at least 5 middle messages before compression is worth doing
If the model provider reports real prompt-token usage, AIClaw uses that. Otherwise it falls back to an internal estimate. That matters because the trigger is based on actual prompt pressure, not just message count.
What Gets Compressed, And What Stays Intact
AIClaw uses a four-phase flow.
1. Prune old tool output first
Before asking the model to summarize, AIClaw trims older tool messages outside the protected tail window. Tool outputs in that middle region are truncated to 200 runes. That keeps huge logs from dominating the summary prompt.
This is an important design choice. The runtime does not try to summarize raw noise at full size first. It reduces obviously low-value bulk before paying for the summarization call.
2. Protect the head and the tail
The compressor preserves:
- the head of the conversation, especially the system prompt
- the latest tail of the conversation, where the current working state lives
The part in the middle becomes the candidate for compression.
3. Ask an LLM for a structured summary
Instead of generating a vague paragraph, AIClaw asks for a strict template with sections like:
- Goal
- Constraints And Preferences
- Progress
- Key Decisions
- Relevant Files
- Next Steps
- Critical Context
This is a practical choice for agent continuity. The summary is meant to preserve execution state, not produce pretty prose.
4. Rebuild the conversation with a summary message
After summarization, AIClaw inserts a [Context Compression Summary] message and appends a note to the system prompt that earlier conversation has been compressed.
The result is smaller than the original history, but still carries forward the task objective, decisions, blockers, touched files, and next action.
Tool Calls Are Not Split Apart
A subtle detail in the implementation is that AIClaw does not cut through an assistant/tool-call group. The compressor aligns the preserved tail boundary backward so a tool call and its tool results stay together.
That matters because broken tool-call sequences are confusing for the next model round. If an assistant message says it called a tool but the corresponding tool results are missing from the preserved tail, the reconstructed context becomes misleading.
There are tests for this behavior in internal/agent/context_compressor_test.go.
Compression Is Iterative, Not One-Shot
AIClaw also keeps the previous compression summary in memory during the active run. On the next compression pass, it does not start from zero. It sends:
- the previous summary
- the newly accumulated conversation slice
Then it asks the model to merge them into an updated structured summary.
This makes repeated compression cheaper and more stable in long tasks. Instead of re-summarizing the entire old middle history every time, AIClaw incrementally rolls forward the important state.
Which Model Handles Compression
The main execution loop prefers the agent's FastModelName for compression when one is configured; otherwise it falls back to the primary model.
That is a good default for a local-first agent platform:
- the expensive or premium model stays focused on the real task
- the cheaper or faster model can handle summarization work
- prompt size stays under control during long sessions
A Practical Example
Imagine a debugging session where an AIClaw agent:
- reads several Go files
- runs tests
- inspects logs
- edits code
- reruns tests
- asks a sub-agent to inspect a failing subsystem
- returns to the parent run for the final fix
Without compression, the conversation history gradually becomes a pile of stale tool output. With compression, AIClaw can keep the current tail intact while rolling earlier work into a structured checkpoint that still remembers:
- which files were already inspected
- which commands succeeded or failed
- what the user asked for
- which constraints matter
- what remains unresolved
That is the difference between “shorter prompt” and “runtime continuity.”
Why This Feature Matters
AIClaw is opinionated about execution state. It already treats plan state, generated files, execution steps, memory, and conversation history as first-class runtime data. Context compression fits the same design philosophy.
The goal is not to make the transcript prettier. The goal is to keep an agent useful after a long stretch of real work.
If you are building agents that mostly answer in one turn, this feature is easy to ignore. If you are building agents that browse, edit, run commands, and recover from failure across many rounds, it becomes part of the reliability story.
AIClaw keeps that logic in the runtime rather than pushing the entire burden onto prompt engineering.
Where To Look In The Code
-
internal/agent/context_compressor.go: compression thresholds, protected windows, summary prompt, iterative summary logic -
internal/agent/run.go: where compression is triggered in the execution loop -
internal/agent/context_compressor_test.go: tests for summary injection, iterative updates, tool-group preservation, and duplicate-note prevention -
README.md: product-level runtime compression description
AIClaw is open source, self-hosted, and built for agents that do more than chat. Context compression is one of the small runtime details that makes that practical over longer sessions.
Top comments (0)