Most agent products eventually hit the same UX problem: complex tasks need planning, but users do not want the final answer buried under noisy TODO updates.
AIClaw handles that with an existing core feature called Runtime Plan State. Instead of storing planning as ordinary assistant text, AIClaw treats the plan as runtime state owned by the executor. The model can propose or revise a plan, but the harness validates it, persists it, streams it live, and links the final snapshot to the assistant message after execution finishes.
This post is not announcing a brand-new feature. It is a deeper look at how AIClaw already implements planning in a way that stays useful during execution without polluting the conversation itself.
The problem with chat-visible plans
If an agent writes plans directly into chat, several issues show up quickly:
- progress updates become part of the permanent conversational transcript
- repeated plan rewrites waste tokens and distract from the real answer
- tool errors and retries are hard to map back to the current step
- users can see planning noise, but still cannot reliably inspect execution state
AIClaw's approach is to separate these concerns:
- the user sees the final answer as an answer
- the executor keeps the plan as structured runtime state
- the log timeline shows both plan progress and tool activity
The repository README describes this directly: AIClaw uses Plan State instead of chat-visible TODO blocks, and streaming chat plus execution logs show the plan separately from the assistant answer.
What Runtime Plan State means in AIClaw
At a high level, AIClaw's execution loop does this:
- Load the agent, tools, memory, files, and conversation history.
- Inject compact runtime context, including the current plan state.
- Call the model with tools.
- Execute tool calls and track progress.
- Let the harness advance the plan after success or failure.
- Save the final assistant response and the plan snapshot.
The plan has a small lifecycle instead of being treated like free-form prose:
pending -> running -> completed
-> failed
-> blocked
pending -> skipped
That lifecycle matters because the harness can enforce behavior the model should not be trusted to enforce by itself.
The design choice: model proposes, harness owns
In internal/agent/plan.go, the internal plan control tool supports actions such as set, update, revise, and read. But the important part is not the tool surface. The important part is ownership:
- the model proposes plan changes
-
PlanManagernormalizes and validates state - the store persists the active run and items
- the executor refreshes the compact plan block before each LLM round
That split keeps the model flexible without giving it full control over task state.
For example, AIClaw enforces that only one plan item can be running at a time. The tests in internal/agent/plan_test.go explicitly verify that if multiple items are proposed as running, the plan is normalized back to a single running item.
Why the prompt stays compact
One subtle but important implementation detail is that AIClaw does not inject the full plan history into every model call.
The PromptBlock path in internal/agent/plan.go builds a compact <plan_state> block with:
- the goal
- the current running step
- a short pending-step summary
- the latest revision reason
- numeric progress
The design notes in docs/design/agent-improvements.md call this out clearly: only the goal, current running item, remaining pending summary, and recent revision reason are injected each round so the full history does not consume context budget.
This is a practical design choice. Planning helps the model stay oriented, but dumping the whole plan transcript back into the prompt every round would work against that goal.
How execution advances the plan
The main run loop in internal/agent/run.go refreshes plan state before each model call. When tool or LLM work fails, the harness can mark the current step as failed. When a step succeeds, the harness can complete it and advance to the next pending one.
That behavior is also covered by tests:
- a failed running step advances the next pending step into
running - linking the final assistant message marks the last running step as completed
- failed plans stay failed when the final message is attached
This is the difference between "the model wrote a checklist" and "the system is actually operating a task state machine."
What the user sees
From the product side, Runtime Plan State gives AIClaw a cleaner split between response and observability:
- the final answer is not cluttered by plan chatter
- streaming progress can still expose the live plan
- execution logs keep the plan snapshot, assistant response, and tool timeline separate
That matters for real tool-using agents. If an agent reads files, runs commands, searches the web, or delegates to sub-agents, users need to inspect progress and failures without turning the final answer into a debug trace.
A practical example
Imagine an AIClaw agent is asked to:
- inspect a codebase
- find the cause of a failing behavior
- patch the code
- run tests
- summarize the result
With Runtime Plan State, the plan can exist as structured execution state while the tool timeline records the underlying work. If the test step fails, AIClaw can mark that step as failed and continue the state transition logic cleanly. If the work completes, the final answer can stay focused on outcome, not internal bookkeeping.
That is a better fit for production-style agent work than chat-visible TODO spam.
Why I think this is the right abstraction
AIClaw's design makes a strong distinction:
- planning is operational state
- answers are user-facing output
- logs are for inspection
Those should not all be the same thing.
A lot of agent systems blur the line between them. AIClaw's Runtime Plan State is interesting precisely because it does not.
If you are building self-hosted agents and want both cleaner chat UX and better execution observability, this is one of the AIClaw features worth studying in the codebase.
AIClaw is open source here: github.com/chowyu12/aiclaw
Top comments (2)
Really interesting take on the 'model proposes, harness owns' pattern. I ran into a similar separation problem — agents don't have a 'thinking mode' vs 'action mode', so they blur ideation and execution. Put together Brainstorm-Mode (mehmetcanfarsak on GitHub) that adds three modes (divergent, actionable, academic) via PreToolUse hooks so the harness owns the mode while the model proposes ideas. Same philosophy: infrastructure enforces structure the model shouldn't be trusted to self-regulate.
Love the "model proposes, harness owns" design choice — that is exactly the pattern I landed on too. I built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) using the same philosophy: the model proposes tool calls, but PreToolUse hooks enforce whether they should execute based on the current mode. Same idea of not trusting the model to self-regulate. Three modes (divergent, actionable, academic) act as that plan lifecycle for ideation vs execution. The harness enforces the boundary, the model stays flexible.