ContextClaw: The OpenClaw Plugin That Cut My Token Bill 55%

#devchallenge #openclawchallenge #ai #agents

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Challenge.

Every agent system eventually hits the same wall: the model is not forgetting because it is dumb. It is forgetting because you are feeding it a landfill.

Old tool output. Half-fixed errors. File reads from a task you abandoned twenty minutes ago. Five versions of the same plan. Then you ask the model to be precise while its context window is full of stale evidence.

ContextClaw is my attempt to fix that inside OpenClaw.

What I Built

ContextClaw is a context management layer for OpenClaw. It sits between the workspace and the model, classifies each message, attaches a task-bucket sticker, and evicts context by task boundary instead of raw recency. The goal is simple: keep the intent, decisions, and active working state; drop the tool spam and dead branches.

On real working sessions, that pattern cuts token load by 55%+ versus dumping the whole rolling transcript back into the model. The important part is not just compression. It is inventory. The agent knows what each piece of context is, what task it belongs to, and whether it should still be in the room.

raw session -> [classifier] -> typed messages
            -> [stickerer]  -> task-bucketed messages
            -> [evictor]    -> task-scoped context -> model

Bigger context windows help. They do not solve the core problem. If your workflow keeps stuffing irrelevant state into the prompt, a bigger window just gives you a larger junk drawer.

How I Used OpenClaw

OpenClaw is the right place to build this because OpenClaw already treats agent work like a real system: tools, skills, files, providers, sessions, and workspace state. ContextClaw plugs into that turn lifecycle and changes what reaches the model.

The rough shape is:

~/.openclaw/plugins/contextclaw/
  plugin.json
  classifier.js
  stickers.js
  evictor.js

I am not going to pretend the install command is cleaner than it is. The safe version is: wire it through OpenClaw's plugin registry, then route each turn's message list through ContextClaw before the provider call. That is the hook. Do not patch random config by hand. Do not rely on a prompt that says "please ignore old context." Make the context layer enforce it.

The classifier gives each message a job. A user request is not the same thing as a tool result. A decision is not the same thing as a stack trace. A sub-agent artifact is not the same thing as a planning note. Representative types look like:

user_intent
tool_call
tool_result
file_read
error_trace
plan
summary
decision
sub_agent_output
system_note
noise

The exact enum matters less than the principle: recency is the wrong axis.

A 100-token decision from turn 3 can be more important than 8,000 tokens of file output from turn 19. Sliding windows do not understand that. Type-aware eviction can.

Then ContextClaw adds stickers. A sticker is a small label that says what task a message belongs to and what kind of context it is. A representative line might look like:

[DEV-A] tool-file-read: POST_A_SPEC.md
[DEV-A] decision: ContextClaw is the Prompt A project angle
[DSB-3] error_trace: Twilio auth failure

Now the evictor has a useful signal. When I am writing the OpenClaw Challenge post, I need [DEV-A]. I do not need a stale [DSB-3] SMS debugging trace, even if it happened more recently.

This connects directly to my file-as-interface workflow. In my OpenClaw workspace, files like AGENTS.md, NEXT_TICKET.md, STATUS.md, TASKS.md, and BLOCKER.md are not decoration. They are the control plane. NEXT_TICKET.md says what the active task is. STATUS.md says what changed. BLOCKER.md means a human gate exists.

ContextClaw reads those workspace signals and uses them to decide bucket boundaries. When NEXT_TICKET.md changes, the active bucket rolls. The model does not need to be begged to forget. The filesystem already made the task switch explicit.

That is the whole trick. Do not ask the agent to infer workflow state from vibes. Put the workflow state somewhere durable, then make the context layer obey it.

I also filed OpenClaw issues around the places where this should become more visible and reliable. Issue #64085 is about provider circuit breakers: if a provider starts returning quota or rate-limit errors, OpenClaw should stop hammering it and route around it. Issue #64086 is about exposing plugin status in the TUI footer. ContextClaw should be able to show a live tokens-saved counter where the user can actually see it.

That matters because context management should not be mystical. If a plugin says it saved 55%, I want the footer to show the before and after. Tokens before. Tokens after. Decision made.

Demo

The demo target is a normal OpenClaw work session: same model, same workspace, same prompt, first with raw transcript context and then with ContextClaw enabled.

The shape of what I see in practice:

baseline context:  full rolling transcript + tool spam
with ContextClaw:  typed, bucketed, task-scoped context
observed ratio:    roughly 55% fewer tokens per turn on multi-turn work

I am not going to post a faked screenshot to hit the "Demo" header. The honest version is: the savings compound on long sessions with lots of tool output, and they mostly disappear on 2–3 turn toy tasks. The measurement that matters is stable output quality at lower token cost, not a single pretty number. A live tokens-saved counter in the TUI footer is what issue #64086 is about — that is the artifact I want before I publish benchmark-style numbers.

Repo: work-in-progress. I'll link it from an update once it's in a state I'd want someone else to read.

What I Learned

Classification beats recency. Most context systems treat the newest thing as the most important thing. That is wrong for agent work. The newest thing is often a giant tool result that only mattered for one local decision.
Task boundaries are the real eviction signal. NEXT_TICKET.md changing is stronger than a semantic guess. It says: the job changed. Old bucket out, new bucket in. Cheap. Explicit. Easy to audit.
ContextClaw loses on tiny tasks. If the whole job is two turns, classification overhead can be more machinery than you need. The payoff starts when the task has enough turns, file reads, tool output, and course corrections for context rot to appear. Roughly: real work, not a toy prompt.
Files beat embeddings for basic agent state. I like knowledge graphs. I like retrieval. But the 80% win here came from stickers plus eviction, not from trying to make memory magical. The filesystem already knows more about the workflow than the prompt does.

The broader lesson is uncomfortable: a lot of "agent memory" work is compensating for workflows that never made state explicit in the first place.

OpenClaw made the fix obvious because the workspace is already there. Root files. Tools. Sessions. Plugins. Providers. It is close enough to an operating system for agents that context can become infrastructure, not a paragraph in the system prompt.

If your context window feels crowded, your agent does not need a bigger model. It needs an inventory system.