Hex

Posted on Apr 10 • Originally published at openclawplaybook.ai

OpenClaw Session Pruning: Reduce Context Bloat Without Losing History

#ai #agents #automation #productivity

Long-running agent sessions get expensive in a boring way. Not because the user asked a profound question, but because the transcript quietly filled up with shell output, giant file reads, search dumps, and tool chatter that nobody actually needs on the next turn.

OpenClaw has a built-in answer for that: session pruning. It trims old tool results out of the in-memory prompt before the next model call, while leaving normal conversation text alone and preserving the on-disk transcript. That distinction matters. You get a leaner working context without pretending the history never happened.

If you already care about how the Gateway manages sessions or you are tuning reasoning budget for real operator work, pruning is one of the highest-leverage settings to understand. It is not flashy, but it is exactly the kind of feature that makes an agent cheaper and steadier over long sessions.

What session pruning actually does

Per the docs, session pruning trims old tool results from the in-memory context right before an LLM call. It does not rewrite the on-disk session history, and it does not modify user or assistant messages.

That means OpenClaw is not deleting your conversation. It is deciding that the model probably does not need to re-read a massive old exec output or an oversized read result every time the session wakes back up.

The docs are explicit about the boundary:

Prunable: toolResult messages only.
Protected: user messages and assistant messages.
Persistent history: unchanged on disk.

That is the design I want from an operator tool. Trim the expensive noise, not the actual conversation.

Why OpenClaw added it

The main reason is prompt-cache efficiency, especially for Anthropic paths. The docs say pruning is only active for Anthropic API calls, including OpenRouter Anthropic models, and it is built around cache TTL behavior.

Here is the real problem. Anthropic prompt caching helps when several requests hit the same prompt prefix inside the cache window. But once that TTL expires, the next request has to write the prompt back into cache again. If the session has accumulated a lot of old tool output, that cache write gets bloated and more expensive than it needs to be.

Session pruning cuts that waste. The documented flow is simple:

Wait until the cache TTL is old enough.
Inspect old tool results in the in-memory context.
Soft-trim oversized results first.
Hard-clear older tool results if more reduction is needed.
Reset the TTL window so follow-up turns reuse the fresher cache.

That is why the feature is more than generic “context cleanup.” It is directly tied to cost control and cache behavior.

When pruning runs, and when it does not

The main mode documented today is cache-ttl. In that mode, pruning only runs when the last Anthropic call for the session is older than the configured TTL. The default TTL is 5m.

{
  agents: {
    defaults: {
      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m"
      }
    }
  }
}

If the session is still within that TTL window, OpenClaw keeps the cache warm and does not prune yet. If the TTL has expired, pruning can run before the next request. That is an important nuance. This is not an every-turn mutilation pass. It is a targeted cleanup for the expensive post-idle request.

The docs also call out a few skip conditions that matter in practice:

If there are not enough assistant messages to establish the protection cutoff, pruning is skipped.
Tool results containing image blocks are skipped and never trimmed or cleared.
On non-Anthropic model paths, pruning is off by default.

So if you were picturing OpenClaw aggressively chopping up every session on every provider, that is not what the documented behavior says.

Want the practical operator settings, not just the feature list?

ClawKit shows you how to tune pruning, compaction, memory, model defaults, and safety rails so long-running agents stay cheap and useful.

Get ClawKit — $9.99 →

Soft-trim versus hard-clear

OpenClaw does pruning in two layers.

Soft-trim

Soft-trim is for oversized tool results. The docs say OpenClaw keeps the head and tail, inserts ... in the middle, and appends a note about the original size. By default, soft-trim uses maxChars: 4000 with headChars: 1500 and tailChars: 1500.

This is the polite version of pruning. You still preserve the opening context and the ending outcome, which is often enough for follow-up reasoning.

Hard-clear

Hard-clear is the blunter tool. It replaces the entire tool result with the placeholder [Old tool result content cleared] when that is needed to reduce context further. The default hard-clear behavior is enabled.

That sounds harsh until you remember what is being targeted: stale tool output, not the human conversation. If an old shell dump is no longer helping the current turn, a placeholder is fine.

What stays protected

This is the part operators should understand before touching the config. Pruning is not random. The docs say the last keepLastAssistants assistant messages are protected, and tool results after that cutoff are not pruned. The default is 3.

In plain English, OpenClaw tries to keep the recent working set stable. It does not aggressively erase the tools surrounding the freshest assistant turns, because those are the ones most likely to matter for the next reply.

That protection layer is why pruning and compaction can coexist without feeling destructive. You keep the nearby local context, while older bulky outputs get trimmed first.

Pruning is not compaction, and that is a good thing

People often mix pruning and compaction together. The docs do not.

Pruning trims old tool results in memory for a request.
Compaction summarizes older conversation and persists that summary into session history.

That means pruning is transient, while compaction becomes part of the durable session record. OpenClaw's compaction docs are very clear about this: compaction persists in JSONL history, pruning does not.

I like this split because the two features solve different problems:

Pruning keeps idle-session wakeups cheap.
Compaction keeps very long sessions alive when the total context gets tight.

You should not think of pruning as a replacement for compaction. It is the lighter, cheaper cleanup that helps you avoid dragging a pile of stale tool output into every post-idle request.

The default numbers are worth knowing

If you are tuning this feature, the docs list these defaults for enabled pruning:

ttl: "5m"
keepLastAssistants: 3
softTrimRatio: 0.3
hardClearRatio: 0.5
minPrunableToolChars: 50000
softTrim.maxChars: 4000
hardClear.placeholder: "[Old tool result content cleared]"

You do not need to memorize all of them, but you should understand the spirit. OpenClaw is not trimming tiny harmless outputs. It is targeting tool-result bloat that is large enough to matter.

Smart defaults for Anthropic profiles

OpenClaw also applies smart defaults for Anthropic profiles. The session pruning docs say OAuth or setup-token Anthropic profiles enable cache-TTL pruning and set heartbeat to one hour, while Anthropic API key profiles enable cache-TTL pruning and set heartbeat to 30 minutes.

That is a subtle but smart operational choice. API key users usually care more directly about recurring token cost, so a shorter heartbeat plus pruning makes sense. If you set your own values explicitly, OpenClaw does not override them.

This is one of those places where the product is acting like a real operator tool instead of a toy. It is trying to make the cheaper path the default path.

An advanced edge case: legacy image cleanup

The docs describe a separate, idempotent cleanup path for older legacy sessions that persisted raw image blocks in history. That cleanup preserves the three most recent completed turns byte-for-byte, while allowing older already-processed image blocks in user or toolResult history to be replaced with a placeholder.

This is separate from normal cache-TTL pruning. The point is to stop repeated old image payloads from busting prompt caches on later turns.

You do not need to tune this every day, but it is a good example of the OpenClaw philosophy: keep what matters for recent follow-ups, clean up what only burns tokens.

A practical config pattern

If you run long Anthropic-backed sessions with lots of file or shell work, I would start simple.

{
  agents: {
    defaults: {
      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",
        tools: {
          deny: ["browser", "canvas"]
        }
      }
    }
  }
}

The docs support tool selection rules with tools.allow and tools.deny, including wildcards, and deny rules win. That gives you a clean way to keep pruning focused on the most likely offenders.

I would not get fancy until you see an actual problem. Start with the documented defaults, then tune only if your sessions are still bloated or your follow-up context feels too aggressively cleaned.

When session pruning is most valuable

Agent sessions full of file reads

If your workflow involves lots of read outputs, logs, or generated text blobs, pruning is a straightforward win. The model rarely needs the full older payload again.

Ops sessions with shell output

Long exec results are classic context bloat. They are useful when they arrive. They are often useless 20 minutes later.

Idle sessions that wake back up

This is the sweet spot the docs are really targeting. After the cache TTL expires, pruning lowers the cost of the first request that has to rebuild cache state.

Anthropic-heavy operator setups

Because the documented pruning path is tied to Anthropic caching behavior, operators on those model paths get the clearest payoff.

Final take: this is the kind of feature serious operators should love

Session pruning is not a marketing feature. It is an operator feature. It exists because long-lived agents accumulate junk, and junk costs money.

What I like about OpenClaw's implementation is the restraint. It trims only tool results, skips image blocks, protects recent assistant context, leaves the durable transcript alone, and keeps compaction as a separate mechanism. That is a thoughtful boundary, not a blunt hack.

If your agent sessions keep getting slower, pricier, or harder to manage after idle periods, do not only look at model choice. Look at whether you are making the model re-ingest a pile of stale tool output it no longer needs.

That is exactly the mess session pruning was built to clean up.

Want the complete guide? Get ClawKit — $9.99

Originally published at https://www.openclawplaybook.ai/blog/openclaw-session-pruning-reduce-context-bloat/

Get The OpenClaw Playbook → https://www.openclawplaybook.ai?utm_source=devto&utm_medium=article&utm_campaign=parasite-seo

DEV Community