chowyu

Posted on Jun 19

How AIClaw Compresses Long Agent Conversations Without Losing the Important Parts

#agents #ai #llm #opensource

Long-running agent sessions eventually hit the same problem: the model keeps accumulating chat history, tool outputs, intermediate decisions, and execution traces until the prompt becomes expensive or unstable. AIClaw has a built-in answer for that problem. It does not simply drop old messages. It compresses the middle of the conversation into a structured summary and keeps the parts that still matter for the next step.

This is not a new release post. It is a deeper look at one existing AIClaw runtime feature: context compression.

The Problem

AIClaw is designed for tool-using work, not short chatbot replies. A single task can include:

multiple rounds of shell or browser tool calls
long tool outputs
plan-state progress updates
follow-up fixes after the first attempt
sub-agent results flowing back into the parent run

That is useful context, but it also means the prompt grows fast. If the runtime sends everything back to the model forever, cost increases and the model starts paying attention to the wrong parts of the history.

The README describes this capability briefly as:

Runtime compression: Long middle context can be summarized during execution.

The implementation behind that line is more specific than it sounds.

When AIClaw Decides To Compress

The decision lives in internal/agent/context_compressor.go and is wired into the main execution loop in internal/agent/run.go.

Before each LLM round, AIClaw checks whether the current prompt is too large relative to the model context window.

The current defaults are straightforward:

compress when prompt usage reaches 50% of the model context window
keep the system message at the head
keep at least the latest 20 messages at the tail
require at least 5 middle messages before compression is worth doing

If the model provider reports real prompt-token usage, AIClaw uses that. Otherwise it falls back to an internal estimate. That matters because the trigger is based on actual prompt pressure, not just message count.

What Gets Compressed, And What Stays Intact

AIClaw uses a four-phase flow.

1. Prune old tool output first

Before asking the model to summarize, AIClaw trims older tool messages outside the protected tail window. Tool outputs in that middle region are truncated to 200 runes. That keeps huge logs from dominating the summary prompt.

This is an important design choice. The runtime does not try to summarize raw noise at full size first. It reduces obviously low-value bulk before paying for the summarization call.

2. Protect the head and the tail

The compressor preserves:

the head of the conversation, especially the system prompt
the latest tail of the conversation, where the current working state lives

The part in the middle becomes the candidate for compression.

3. Ask an LLM for a structured summary

Instead of generating a vague paragraph, AIClaw asks for a strict template with sections like:

Goal
Constraints And Preferences
Progress
Key Decisions
Relevant Files
Next Steps
Critical Context

This is a practical choice for agent continuity. The summary is meant to preserve execution state, not produce pretty prose.

4. Rebuild the conversation with a summary message

After summarization, AIClaw inserts a [Context Compression Summary] message and appends a note to the system prompt that earlier conversation has been compressed.

The result is smaller than the original history, but still carries forward the task objective, decisions, blockers, touched files, and next action.

Tool Calls Are Not Split Apart

A subtle detail in the implementation is that AIClaw does not cut through an assistant/tool-call group. The compressor aligns the preserved tail boundary backward so a tool call and its tool results stay together.

That matters because broken tool-call sequences are confusing for the next model round. If an assistant message says it called a tool but the corresponding tool results are missing from the preserved tail, the reconstructed context becomes misleading.

There are tests for this behavior in internal/agent/context_compressor_test.go.

Compression Is Iterative, Not One-Shot

AIClaw also keeps the previous compression summary in memory during the active run. On the next compression pass, it does not start from zero. It sends:

the previous summary
the newly accumulated conversation slice

Then it asks the model to merge them into an updated structured summary.

This makes repeated compression cheaper and more stable in long tasks. Instead of re-summarizing the entire old middle history every time, AIClaw incrementally rolls forward the important state.

Which Model Handles Compression

The main execution loop prefers the agent's FastModelName for compression when one is configured; otherwise it falls back to the primary model.

That is a good default for a local-first agent platform:

the expensive or premium model stays focused on the real task
the cheaper or faster model can handle summarization work
prompt size stays under control during long sessions

A Practical Example

Imagine a debugging session where an AIClaw agent:

reads several Go files
runs tests
inspects logs
edits code
reruns tests
asks a sub-agent to inspect a failing subsystem
returns to the parent run for the final fix

Without compression, the conversation history gradually becomes a pile of stale tool output. With compression, AIClaw can keep the current tail intact while rolling earlier work into a structured checkpoint that still remembers:

which files were already inspected
which commands succeeded or failed
what the user asked for
which constraints matter
what remains unresolved

That is the difference between “shorter prompt” and “runtime continuity.”

Why This Feature Matters

AIClaw is opinionated about execution state. It already treats plan state, generated files, execution steps, memory, and conversation history as first-class runtime data. Context compression fits the same design philosophy.

The goal is not to make the transcript prettier. The goal is to keep an agent useful after a long stretch of real work.

If you are building agents that mostly answer in one turn, this feature is easy to ignore. If you are building agents that browse, edit, run commands, and recover from failure across many rounds, it becomes part of the reliability story.

AIClaw keeps that logic in the runtime rather than pushing the entire burden onto prompt engineering.

Where To Look In The Code

internal/agent/context_compressor.go: compression thresholds, protected windows, summary prompt, iterative summary logic
internal/agent/run.go: where compression is triggered in the execution loop
internal/agent/context_compressor_test.go: tests for summary injection, iterative updates, tool-group preservation, and duplicate-note prevention
README.md: product-level runtime compression description

AIClaw is open source, self-hosted, and built for agents that do more than chat. Context compression is one of the small runtime details that makes that practical over longer sessions.

Top comments (1)

xulingfeng • Jun 19

We ran into this building our own memory system. The "tool call + result staying together" constraint is the one that hurts most when it breaks — seen too many traces where the output got compressed but the call got dropped. Makes the whole thing undebuggable. How do you handle the case where a tool call's output spans multiple messages? Followed you 👀