chowyu

Posted on Jun 13

How AIClaw Keeps Agent Plans Out of Chat History with Runtime Plan State

#ai #opensource #agents #webdev

Most agent products eventually hit the same UX problem: complex tasks need planning, but users do not want the final answer buried under noisy TODO updates.

AIClaw handles that with an existing core feature called Runtime Plan State. Instead of storing planning as ordinary assistant text, AIClaw treats the plan as runtime state owned by the executor. The model can propose or revise a plan, but the harness validates it, persists it, streams it live, and links the final snapshot to the assistant message after execution finishes.

This post is not announcing a brand-new feature. It is a deeper look at how AIClaw already implements planning in a way that stays useful during execution without polluting the conversation itself.

The problem with chat-visible plans

If an agent writes plans directly into chat, several issues show up quickly:

progress updates become part of the permanent conversational transcript
repeated plan rewrites waste tokens and distract from the real answer
tool errors and retries are hard to map back to the current step
users can see planning noise, but still cannot reliably inspect execution state

AIClaw's approach is to separate these concerns:

the user sees the final answer as an answer
the executor keeps the plan as structured runtime state
the log timeline shows both plan progress and tool activity

The repository README describes this directly: AIClaw uses Plan State instead of chat-visible TODO blocks, and streaming chat plus execution logs show the plan separately from the assistant answer.

What Runtime Plan State means in AIClaw

At a high level, AIClaw's execution loop does this:

Load the agent, tools, memory, files, and conversation history.
Inject compact runtime context, including the current plan state.
Call the model with tools.
Execute tool calls and track progress.
Let the harness advance the plan after success or failure.
Save the final assistant response and the plan snapshot.

The plan has a small lifecycle instead of being treated like free-form prose:

pending -> running -> completed
                  -> failed
                  -> blocked
pending -> skipped

That lifecycle matters because the harness can enforce behavior the model should not be trusted to enforce by itself.

The design choice: model proposes, harness owns

In internal/agent/plan.go, the internal plan control tool supports actions such as set, update, revise, and read. But the important part is not the tool surface. The important part is ownership:

the model proposes plan changes
PlanManager normalizes and validates state
the store persists the active run and items
the executor refreshes the compact plan block before each LLM round

That split keeps the model flexible without giving it full control over task state.

For example, AIClaw enforces that only one plan item can be running at a time. The tests in internal/agent/plan_test.go explicitly verify that if multiple items are proposed as running, the plan is normalized back to a single running item.

Why the prompt stays compact

One subtle but important implementation detail is that AIClaw does not inject the full plan history into every model call.

The PromptBlock path in internal/agent/plan.go builds a compact <plan_state> block with:

the goal
the current running step
a short pending-step summary
the latest revision reason
numeric progress

The design notes in docs/design/agent-improvements.md call this out clearly: only the goal, current running item, remaining pending summary, and recent revision reason are injected each round so the full history does not consume context budget.

This is a practical design choice. Planning helps the model stay oriented, but dumping the whole plan transcript back into the prompt every round would work against that goal.

How execution advances the plan

The main run loop in internal/agent/run.go refreshes plan state before each model call. When tool or LLM work fails, the harness can mark the current step as failed. When a step succeeds, the harness can complete it and advance to the next pending one.

That behavior is also covered by tests:

a failed running step advances the next pending step into running
linking the final assistant message marks the last running step as completed
failed plans stay failed when the final message is attached

This is the difference between "the model wrote a checklist" and "the system is actually operating a task state machine."

What the user sees

From the product side, Runtime Plan State gives AIClaw a cleaner split between response and observability:

the final answer is not cluttered by plan chatter
streaming progress can still expose the live plan
execution logs keep the plan snapshot, assistant response, and tool timeline separate

That matters for real tool-using agents. If an agent reads files, runs commands, searches the web, or delegates to sub-agents, users need to inspect progress and failures without turning the final answer into a debug trace.

A practical example

Imagine an AIClaw agent is asked to:

inspect a codebase
find the cause of a failing behavior
patch the code
run tests
summarize the result

With Runtime Plan State, the plan can exist as structured execution state while the tool timeline records the underlying work. If the test step fails, AIClaw can mark that step as failed and continue the state transition logic cleanly. If the work completes, the final answer can stay focused on outcome, not internal bookkeeping.

That is a better fit for production-style agent work than chat-visible TODO spam.

Why I think this is the right abstraction

AIClaw's design makes a strong distinction:

planning is operational state
answers are user-facing output
logs are for inspection

Those should not all be the same thing.

A lot of agent systems blur the line between them. AIClaw's Runtime Plan State is interesting precisely because it does not.

If you are building self-hosted agents and want both cleaner chat UX and better execution observability, this is one of the AIClaw features worth studying in the codebase.

AIClaw is open source here: github.com/chowyu12/aiclaw

Top comments (2)

Mehmet Can Farsak • Jun 13

Really interesting take on the 'model proposes, harness owns' pattern. I ran into a similar separation problem — agents don't have a 'thinking mode' vs 'action mode', so they blur ideation and execution. Put together Brainstorm-Mode (mehmetcanfarsak on GitHub) that adds three modes (divergent, actionable, academic) via PreToolUse hooks so the harness owns the mode while the model proposes ideas. Same philosophy: infrastructure enforces structure the model shouldn't be trusted to self-regulate.

Mehmet Can Farsak • Jun 13

Love the "model proposes, harness owns" design choice — that is exactly the pattern I landed on too. I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) using the same philosophy: the model proposes tool calls, but PreToolUse hooks enforce whether they should execute based on the current mode. Same idea of not trusting the model to self-regulate. Three modes (divergent, actionable, academic) act as that plan lifecycle for ideation vs execution. The harness enforces the boundary, the model stays flexible.