DEV Community: chowyu

How AIClaw Adds Durable Memory Without Turning Prompt Context Into a Global Text File

chowyu — Tue, 21 Jul 2026 10:50:21 +0000

AI agents need memory, but most "memory" implementations are either too weak to be useful or too loose to be safe. A flat MEMORY.md file is simple, but it quickly becomes hard to audit, hard to scope, and easy to over-inject into prompts.

AIClaw's latest memory work takes a different approach: memory is now stored as application data with ownership, scope, review status, search, evidence, and UI controls. The goal is not to make the model blindly obey old notes. The goal is to give an agent durable context that stays inspectable and bounded.

The problem with file-style memory

The old mental model for agent memory is straightforward: save some useful facts, load them into the next run, and hope the assistant gets better over time.

That breaks down in real use:

Not every remembered fact should apply to every agent.
User-specific context should not leak across users.
Candidate notes should not be injected as if they were trusted truth.
Sensitive content needs review before it becomes active context.
Operators need to know what memory was used in a given answer.

AIClaw now treats memory as structured state instead of a shared free-form prompt file.

What changed in AIClaw

The new Durable Memory section in the README describes a database-backed Memory System where each memory record carries:

an owner
a scope
a kind
a stable key
importance and confidence
status
revision history
evidence links

Two scopes define visibility:

user: available to every agent for the same user
agent_user: available only to one user and one specific agent

That is a practical design choice. Some facts are general preferences like response style. Others are agent-specific procedures that should stay bound to one agent.

Memory is reviewable, not magical

One of the most useful parts of this design is the lifecycle:

candidate -> active -> superseded / dismissed / deleted

This means agent-written memory does not have to become active prompt context immediately.

In the new service layer, propose writes candidate memory, while upsert can create active memory. Sensitive writes are forced back to candidate status even if something tries to mark them active. That gives operators a real review boundary instead of a best-effort convention.

The new web Memory page exposes that workflow directly. You can:

browse active memories
inspect candidates separately
search and filter by scope and kind
approve or dismiss candidates
edit, pin, expire, or delete records
inspect revision history and evidence

That is much closer to how teams actually need agent memory to work.

Retrieval is scoped before it reaches the prompt

The most important runtime behavior is not storage. It is retrieval.

According to docs/design/memory-system.md, AIClaw retrieves only:

active, unexpired records
for the current user
for the current agent when scope is agent_user

Candidate records, foreign-user records, and other-agent records are excluded from prompt injection.

The memory service then adds pinned records, de-duplicates the set, compacts it to a prompt budget, and renders a bounded <memory_context> block with a safety preamble: retained memory is context, not instruction, and it loses to the current user request and verified tool results.

That distinction matters. A lot of memory systems fail because old notes become pseudo-system prompts. AIClaw is explicitly avoiding that.

Evidence and auditability are first-class

Another strong design choice is the evidence model.

Each memory can accumulate:

source evidence showing where it came from
usage evidence linking it to a run and final assistant message

So when a memory influences an answer, there is a trail. The README and design doc both emphasize that the final memory snapshot can be shown beside chat and execution logs without mixing memory content into the assistant response body itself.

This is the kind of detail that makes durable memory operationally useful instead of just conceptually interesting.

The tests cover the failure modes you actually worry about

The new memory tests are worth reading because they target the right risks.

TestBuildContextIsolatedAndCandidateSafe verifies that:

only the current user's records are retrieved
agent-specific memory does not leak from another agent
candidate memory is excluded
the prompt includes the safety boundary text

TestSensitiveMemoryRequiresReview verifies that sensitive memory cannot stay active and is pushed back to candidate review status.

TestToolProposalCreatesCandidate verifies that tool-driven proposals create candidate memories rather than silently activating them.

Those are the right invariants for a production memory feature.

What this looks like in practice

Here is a realistic workflow:

A user repeatedly asks one agent for terse release notes and detailed test evidence.
The agent proposes that as an agent_user preference.
The operator reviews it in the Memory page and approves it.
Future runs for that same user-agent pair can retrieve it automatically.
Another agent for the same user does not inherit that preference unless the scope is intentionally broader.

This keeps memory useful without making it global by accident.

Why I picked this feature for today's article

This is a concrete new feature from the July 19, 2026 commit feat(memory): add scoped durable memory system. It also fills a different slot than the earlier AIClaw article about memory/session search: that earlier topic focused on finding prior conversations, while this one is about durable, scoped, reviewable context that can be injected into future runs.

For AI agent systems, memory is one of the easiest places to be vague. AIClaw's newer implementation is specific about scope, review, retrieval, and evidence. That makes it worth documenting on its own.

How AIClaw's Code Interpreter Turns Scripts Into Downloadable Agent Outputs

chowyu — Sun, 19 Jul 2026 10:03:46 +0000

AI agents often need a place to do small but real computation: clean a CSV, convert a JSON payload, generate a chart, inspect an API response, or produce a one-off report. The problem is that many agent products either stop at “here is some code you can run yourself” or run code in a way that is hard to inspect afterward.

AIClaw takes a more practical route. Its built-in code_interpreter tool runs Python, JavaScript, or shell snippets inside the agent workspace, returns structured execution results, and can surface generated files back into the chat flow.

This is not a brand-new July 2026 launch. It is an existing AIClaw feature that is worth a closer look because it connects a few product surfaces that matter in day-to-day use: sandboxed execution, structured tool output, downloadable artifacts, and reusable workflow skills.

The problem: agents need computation, not just text

A useful agent should be able to do more than describe a transformation. It should be able to execute it.

Common examples:

parse a CSV and calculate grouped statistics;
convert JSON into Markdown or HTML;
generate a small script to validate API output;
produce a PNG, CSV, or report file for the user to download.

Without an execution tool, the model either hallucinates results or pushes the work back to the operator. With a raw shell tool alone, the agent can run commands, but the product still needs a consistent way to track stdout, stderr, exit status, duration, and any generated files.

That is the gap AIClaw's code_interpreter fills.

What the tool actually does

In the current repository, AIClaw registers code_interpreter as a built-in tool alongside read, write, exec, browser, cron, memory, and sub_agent.

The tool contract is intentionally small:

language: python, javascript, or shell
code: the source to execute
timeout: optional, with a default of 60 seconds and a hard cap of 120 seconds

The handler then:

Validates the input.
Resolves the runtime binary with exec.LookPath:
- python3 for Python
- node for JavaScript
- sh for shell
Applies lightweight safety checks against obviously dangerous code patterns.
Writes the snippet into the agent sandbox as a real file such as exec_ab12cd34.py.
Executes that file with the sandbox directory as the working directory.
Returns structured JSON with:
- ok
- language
- file
- exit_code
- stdout
- stderr
- error
- duration_ms

That structure matters because the result is machine-friendly for the runtime and still readable in logs and chat progress.

Why writing a real file is the important design choice

One subtle but useful implementation detail is that AIClaw does not treat the snippet as an opaque in-memory eval. It writes the code to a real file inside the workspace sandbox and runs that file from there.

That gives the agent a cleaner execution model:

file-relative reads and writes behave normally;
generated artifacts land in the same workspace the rest of the agent can inspect;
operators can reason about what happened from a concrete script filename;
execution traces line up better with file outputs and follow-up tool calls.

For an agent platform, this is much more practical than a hidden REPL with no durable execution context.

How downloadable outputs are recovered

The interesting part is not only that code runs. It is that AIClaw can turn the result into something a user can actually open.

AIClaw's result parsing layer looks for file outputs in a few ways:

a direct JSON file result payload;
an absolute file path printed as the entire output;
an absolute file path embedded in stdout, such as Saved to /tmp/report.csv.

If the referenced file exists, AIClaw converts it into a normalized file result with MIME type detection by extension. That is what allows generated files to move from “the script said it wrote something” to a proper attachment flow in chat and the UI.

This complements the generated-file work already visible elsewhere in the product. The code interpreter is one of the fastest ways for an agent to produce those artifacts on demand.

Safety and limits

AIClaw is not pretending this tool is a full security boundary, but it does enforce a few useful guardrails in the current implementation.

The handler blocks some obviously dangerous patterns, for example:

destructive shell commands like rm -rf /;
Python subprocess or direct system-call patterns;
JavaScript process and filesystem patterns associated with unsafe execution.

It also:

requires the runtime binary to exist before execution starts;
limits output to 10,000 characters;
caps timeout at 120 seconds;
fails cleanly when the workspace sandbox is not initialized.

That is a pragmatic posture for a self-hosted platform: useful by default, explicit about constraints, and easy to reason about from code.

Where this fits in real AIClaw workflows

The feature becomes more valuable when combined with the rest of AIClaw.

For example:

read can inspect a source file;
code_interpreter can transform or analyze it;
write can save a cleaned result;
the result layer can expose generated files back to the user;
sub_agent can parallelize independent analysis tasks;
skills can standardize the workflow for repeatable use cases.

The repository already reflects this pattern. AIClaw's built-in data-pipeline skill explicitly tells agents to use code_interpreter for CSV, JSON, and Excel-style processing tasks, including validation and export.

That is the right level of abstraction. The skill captures the workflow, while the interpreter provides the execution primitive.

A concrete example

Suppose a user uploads a CSV and asks:

Group sales by month, calculate totals, and give me a downloadable result.

An AIClaw agent can:

inspect the file with read;
write a short Python script in code_interpreter;
save monthly_sales.csv inside the sandbox;
print the absolute output path;
return both a summary and the generated file attachment.

The same pattern works for:

HTML report generation;
JSON normalization;
log summarization;
quick chart creation;
one-off format conversion.

Why this feature is worth covering now

A lot of agent demos focus on planning, tools, or model routing. Those matter, but the product becomes much more useful when an agent can produce concrete outputs that survive beyond the answer text.

AIClaw's code_interpreter is a good example of that product thinking:

simple tool contract;
explicit runtime selection;
workspace-local execution;
structured results;
attachment-friendly file detection;
natural integration with skills and other tools.

If you are building a self-hosted agent platform, this is a feature worth studying. It is not only about “running code.” It is about making execution auditable, composable, and actually useful to the person operating the system.

How AIClaw Hardens Local Agent Runtimes on Your Machine

chowyu — Sat, 18 Jul 2026 08:37:03 +0000

AiClaw’s latest local-runtime work fixes a practical problem that shows up fast in self-hosted agent systems: the server process can be running fine, but the agent CLI you already installed is invisible or unreliable when the runtime tries to execute it.

In the July 18, 2026 fix: harden local runtime and retries commit, AIClaw tightened the local runtime path and command-resolution layer so local agent execution behaves more like your real login shell, while still keeping execution explicit and auditable.

The problem: service PATHs are usually worse than your terminal PATH

AIClaw is designed to run local agent CLIs directly on the machine that hosts the server. The README now makes that contract explicit:

the built-in Local runtime is created automatically at startup;
AIClaw recovers the current user’s login-shell PATH;
it adds standard package-manager locations;
it executes detected CLIs in-process without an extra daemon or shell hop.

That matters because service managers often start processes with a narrow PATH. On macOS, launchd is a classic example. A CLI may work in your terminal, but the server process may not see the same executable path when it tries to:

detect available runtimes;
launch a selected CLI later;
reconnect after restart;
run from a background service instead of an interactive shell.

The result is the worst kind of reliability bug: detection says one thing, execution does another.

What changed in the code

The new internal/runtimeclient/environment.go centralizes environment handling for local runtimes.

Instead of trusting the raw service environment, AIClaw now:

Reads the current environment as a base.
Recovers the login-shell PATH with shell -ilc.
Merges that with common install locations such as:
- /opt/homebrew/bin
- /usr/local/bin
- ~/.local/bin
- ~/go/bin
- ~/.volta/bin
- ~/Library/pnpm
- ~/.asdf/shims
- ~/.mise/shims
Reuses that same resolved environment for both CLI detection and actual execution.

That “same environment for detection and execution” point is the real fix. AIClaw no longer reports a CLI as available using one lookup path and then launches it with a different process environment later.

Why this is better than a shell wrapper

AIClaw still executes runtime commands directly as command + args. It does not pass them through a shell.

That preserves a few useful properties:

argument handling stays predictable;
quoting bugs are avoided;
the runtime surface is smaller;
the execution model is easier to inspect in logs and tests.

So the change is not “use a shell to run everything.” The change is “recover the useful parts of the login environment, then keep execution direct.”

A small but important macOS improvement

There is also a practical macOS-specific rule: when the requested command is codex, AIClaw prefers the signed Codex CLI bundled inside the ChatGPT app at:

/Applications/ChatGPT.app/Contents/Resources/codex

That matters on machines where an older third-party install may still be on PATH. If you explicitly configure an absolute CLI path, that still wins. But for the common default case, AIClaw now prefers the bundled, app-managed binary.

What the tests verify

The new runtime environment tests cover the important failure modes:

merged login-shell and fallback paths are preserved in the final PATH;
non-PATH environment variables stay intact;
command resolution uses the supplied environment consistently;
missing CLIs produce an actionable error that points users to their login-shell PATH;
on macOS, bundled Codex is preferred when available.

This is the right kind of test coverage for runtime plumbing: it checks behavior users actually feel, not just helper internals.

What this means for AIClaw users

If you run AIClaw locally and want to use Codex, Claude Code, Cursor, CodeBuddy, Hermes, or OpenClaw through the built-in runtime, the workflow is now simpler:

Install and authenticate the CLI normally under your own user account.
Start AIClaw.
Open the Runtimes page and verify the detected CLIs.
Create an agent with execution mode Local.
Run tasks without adding a separate runtime daemon on the same machine.

The recent UI copy was updated to match the behavior too. Instead of saying AIClaw only scans the process PATH, the form now explains that it restores the current user’s login-shell PATH and scans from there.

Why I think this feature matters

Local-first agent products usually fail at the boundary between “works in my terminal” and “works under a service manager.” AIClaw’s recent runtime hardening closes that gap in a concrete way:

better environment recovery;
consistent detection and execution;
safer direct command execution;
clearer operator-facing errors;
better defaults on macOS.

That is not a flashy UI feature, but it is exactly the kind of systems work that makes a self-hosted agent platform dependable in day-to-day use.

If you are building a local-first agent stack, this is the right lesson to steal: treat the runtime environment as part of the product surface, not just a deployment detail.

How AIClaw's Harness Runtime Stops Premature "Done" Answers

chowyu — Tue, 14 Jul 2026 08:36:00 +0000

AI agents are good at sounding finished before they have actually finished. They can say a file was generated when no file exists, claim a task is blocked without showing the failed tool call, or return a progress update as if it were a final answer.

AIClaw's recent harness runtime work addresses that gap by moving completion from "the model said it is done" to "the execution layer has enough contract and evidence to allow completion."

This article is based on the current AIClaw repository, especially the harness runtime design and the July 6 and July 9 agent changes that added the verifier layer, execution evidence ledger, explicit finish tool, plan bootstrap, and final-answer gates.

The Problem: LLM Confidence Is Not Execution Truth

In a tool-using agent, the model can produce a polished answer even when:

a required tool was never called
a tool failed and the final answer ignores the failure
a promised attachment was never created
the task plan is still incomplete
the assistant is only narrating progress instead of delivering a result

AIClaw already had runtime plan state, tool execution, generated files, and execution logs. The new harness runtime ties those pieces together with a validation layer that decides whether a run can proceed, needs correction, or is allowed to finish.

The Runtime Model

The core pipeline in pkg/harness is:

Contract -> Evidence -> Validate -> Correct

In practice that means:

TaskContract derives what the run is expected to do.
EvidenceLedger records what the run actually did.
validators check whether the current state satisfies the contract.
correction prompts push the model back into execution when the answer is incomplete.

This is not a separate executor. The main loop still lives in the agent runtime. The harness adds staged validation and correction around that loop.

What The Contract Captures

The contract is derived from the user objective and current execution context. According to the current design and code, it can incorporate:

the user goal
runtime plan requirements
file-delivery intent
tool evidence expectations
attachment context
sub-agent context
correction budget

That matters because "done" means different things for different tasks. A question-answering turn might only need a grounded text answer. A file-generation turn needs actual artifact evidence. A multi-step task may require an active plan and a terminal plan outcome before the final response is allowed.

What Counts As Evidence

AIClaw's verifier collects structured execution evidence rather than relying on the final assistant message alone. The evidence ledger includes:

business tool calls, excluding internal control tools like plan, tool_search, and finish
tool events with tool name, argument summary, output summary, status, duration, files, and failure classification
generated file artifacts persisted as AIClaw files
plan snapshots from the active plan manager
validation and correction events
blocker evidence such as permission, auth, policy, timeout, rate limit, or not-found failures

This gives the final gate something concrete to inspect. If a tool failed, the harness can tell whether it is a recoverable timeout or a terminal auth problem. If a file was promised, the harness can check that file evidence actually exists.

The Four Validation Gates

AIClaw now validates at four different points in the run:

Stage	What it checks
`pre_tool`	Whether a tool call is allowed before execution
`post_tool`	Whether the current tool round produced the evidence the run now has
`pre_final`	Whether the candidate final answer is truly ready to finish
`pre_save`	Whether the final content is still valid after attachment links are rendered

This staged approach is the important design choice.

Many agent systems only validate at the end. AIClaw also validates before tool execution and between tool rounds, so it can keep the model inside the task until it either gathers enough evidence or reaches a real blocker.

What Gets Rejected Before Final Answer

The current runtime can stop finalization when the candidate answer is:

empty
only a progress update
missing required successful evidence
ignoring a terminal blocker
missing promised artifacts
trying to finish while the plan is still open

The logic is visible in the harness design and outcome classifier. Candidate answers are assessed as success, blocked, partial, progress_only, or unknown, then compared with the evidence ledger.

That makes the final gate more than a string check. It is deciding whether the answer matches what the run actually accomplished.

An Explicit `finish` Tool Instead Of Implicit Guessing

One practical addition is the built-in finish tool.

Instead of relying only on "no more tool calls means we must be done," the model can explicitly submit the final user-facing answer through finish. AIClaw captures that answer, runs the same final gate, and only closes the turn if the harness allows it.

This is a clean separation:

the model proposes the final answer
the harness decides whether the answer is allowed to end the run

The finish tool is also excluded from normal business-tool evidence, which prevents completion signaling from being counted as actual execution work.

Plan State And Harness Runtime Work Together

AIClaw already uses runtime Plan State instead of chat-visible todo lists. The recent harness work tightens that integration.

If a contract requires a plan and there is no active plan, the runtime can bootstrap one from an initial template. During execution:

only one plan item is allowed to be running
successful rounds can advance the active item
tool or LLM failures can mark the running item as failed
the final answer is linked to the final plan snapshot

This is useful because the harness is not only validating text quality. It is validating whether the run reached a terminal execution state.

Better Failure Semantics

The runtime now classifies blocker types such as:

permission denied
auth failed
policy blocked
not found
rate limited
timeout
generic tool error

It also distinguishes recoverable from non-recoverable failures. A timeout can lead to another attempt or a correction loop. Missing permissions or invalid authentication should be surfaced clearly as blockers instead of being buried under a generic apology.

That makes the execution log more operationally useful, especially when you are debugging real tool-using agents.

Observable By Design

When validation or correction fails, AIClaw records harness-specific execution steps such as:

validate_pre_tool
validate_post_tool
validate_pre_final
validate_pre_save
correct_pre_final
correct_post_tool
continue_execution
recover_llm_round

The metadata includes the stage, violation codes, required actions, evidence summary, and correction outcome. Successful validation without violations stays quiet to avoid polluting the trace.

That is a good tradeoff: rich debugging data when something goes wrong, low noise when the run is healthy.

Why This Matters In Daily Use

For users, this changes the feel of an agent in a practical way:

file-delivery tasks are less likely to end without actual files
long tasks are less likely to stop at a narrative progress update
tool failures are more likely to be explained with actionable blocker context
execution logs become a better source of truth than the assistant's confidence

For developers building on AIClaw, the bigger value is architectural. The stable pkg/harness package gives you a place to evolve contracts, evidence, validation, and correction rules without rewriting the whole agent loop.

A Small But Important Shift

The key idea behind this feature is simple:

an agent should not be considered done because it sounds done. It should be considered done because the runtime can prove it did the work, produced the promised artifacts, or reached a well-explained blocker.

That is the shift AIClaw's harness runtime is making.

If you are building self-hosted agents that need to do more than chat, this is one of the most important layers to get right.

How AIClaw's Harness Runtime Stops Agents From Pretending They're Done

chowyu — Thu, 09 Jul 2026 12:59:17 +0000

AIClaw has been pushing a new execution layer into origin/master across three recent commits:

2026-07-02 feat: add harness execution protocol
2026-07-06 refactor(agent): integrate harness runtime verifier
2026-07-09 feat(agent): align harness execution runtime

The feature is called the Harness Runtime. The short version is simple: AIClaw no longer treats "the model says it's finished" as enough evidence that the task is actually complete.

Instead, the executor now runs a contract-and-evidence loop that decides whether an answer is allowed to finish.

The Problem

Most agent systems have an annoying failure mode:

The model makes a plan.
It calls a few tools.
It writes a confident answer.
The answer is missing evidence, missing files, or is only a progress update.

If the runtime accepts that answer, the execution trace looks successful even when the work is incomplete.

AIClaw's new harness runtime is built to close that gap. The README now describes it as a layer that turns the user objective into a task contract, records evidence, validates tool and final-answer stages, and injects correction prompts when the work is not actually ready to ship.

The Core Model

The design doc centers the runtime around four parts:

Contract -> Evidence -> Validate -> Correct

Each turn now gets a TaskContract inferred from the user objective, agent profile, tools, files, and plan state. That contract decides things such as:

whether a plan is required
whether tool evidence is required
whether artifacts are required
whether the output should be treated as text, file, mixed, or JSON

Execution evidence is collected in an EvidenceLedger. It records:

execution tools that actually ran
tool events and their status
generated file artifacts
plan snapshots
validation and correction events
blocker evidence such as permission, auth, policy, not-found, rate-limit, and timeout failures

That gives AIClaw a stable internal record of what happened, not just what the model claimed happened.

Four Validation Gates

The verifier now checks four stages:

Stage	What it checks
`pre_tool`	Whether a requested tool call is allowed by policy
`post_tool`	Whether the current round collected usable evidence
`pre_final`	Whether the candidate final answer is complete enough to finish
`pre_save`	Whether the final content still satisfies artifact and attachment requirements before persistence

This matters because different failures show up at different times.

For example:

A tool may be policy-blocked before it runs.
A tool may run but fail with auth or permission errors.
A final answer may be non-empty but still be only a progress message.
A file-delivery task may claim success but omit the generated file link.

The harness catches each of these at the correct stage instead of waiting for the user to notice later.

What Counts as a Real Final Answer

The runtime now classifies final outcomes instead of treating every non-empty answer the same way.

In pkg/harness/outcome.go, AIClaw distinguishes between:

success
blocked
partial
progress_only
unknown

That sounds small, but it changes agent behavior a lot.

If the model says things like "I'm continuing", "please wait", or "next I will generate the file", the candidate can be treated as progress_only and rejected at the final gate.

If tools failed because of permission or authentication issues, the harness can require the final answer to explicitly explain that blocker instead of letting the agent drift into a vague summary.

Corrections Instead of Silent Failure

When validation fails, AIClaw does not immediately give up. The verifier can append a structured correction prompt and continue the turn, but only within a bounded retry budget.

The new runtime standardizes these self-correction nudges so that:

final-gate rejection
post-tool evidence rejection
truncated or interrupted model rounds

all consume the same correction budget.

If the agent still cannot produce a valid answer, AIClaw closes the turn with a concrete failure message rather than pretending the run succeeded.

Better Alignment With Plan State

One detail I like is how this feature works with AIClaw's Runtime Plan State instead of bypassing it.

The harness can bootstrap an initial plan when a task contract requires one. The plan template is intentionally simple:

Understand the goal and dependencies
Execute tools and collect evidence
Validate evidence and output requirements
Deliver the final answer

That keeps planning inside the runtime, while the verifier checks that the plan reaches a terminal state before the answer is accepted.

A New `finish` Tool

The July 9 alignment work also adds a built-in finish tool. The model can explicitly submit a final answer through a structured completion signal, and the harness then runs the explicit final gate before closing the turn.

This is a useful pattern for agent runtimes:

the model can say "this is my final answer"
the executor still decides whether the answer is actually allowed to finish

That separation makes the system easier to reason about and easier to debug in execution logs.

Why This Matters In Practice

This feature is not about making the agent sound smarter. It is about making the runtime more honest.

If a task needs tool-backed evidence, generated files, or a finished plan, AIClaw now has a concrete mechanism to enforce that. The result is a better execution trace for both developers and operators:

fewer fake-complete answers
clearer blocker reporting
better attachment delivery checks
visible harness validation steps in logs when something goes wrong

The README now positions AIClaw as a platform that favors explicit execution traces and durable runtime state over invisible agent magic. The harness runtime is a good example of that philosophy becoming real code.

Where To Look In The Repo

If you want to inspect the implementation, the most relevant files are:

docs/design/agent-harness-runtime.md
internal/agent/harness.go
internal/agent/harness_verifier.go
internal/agent/finish_tool.go
pkg/harness/contract.go
pkg/harness/runtime.go
pkg/harness/outcome.go

The interesting part is not just the validator package. It is the way the verifier, plan state, tool execution, and final message persistence now fit together as one execution contract.

AIClaw is open source and self-hosted, so if you're building agents that need real execution guarantees instead of optimistic chat output, this is one of the areas worth studying next.

AIClaw Now Returns Tool Output Attachments You Can Actually Download

chowyu — Sun, 28 Jun 2026 11:48:42 +0000

AIClaw already let agents run tools, generate files, and continue working across steps. The weak point was the handoff back to the user: a tool might create a report, image, or data file, but the final answer did not always make that output obvious or directly downloadable.

Recent AIClaw changes tightened that workflow. Tool-generated files are now collected, deduplicated, exposed in API responses and streaming completion chunks, linked into the final assistant message, and rendered by the chat UI through stable /public/files/<uuid> download URLs.

This is not a cosmetic change. It closes the loop between tool execution and user-visible results.

The problem

In a tool-using agent system, “I created the file” is not enough. Users need to:

know that a file was produced
see which final answer it belongs to
download it without searching through logs
keep that file associated with the conversation history

Without that, generated artifacts are easy to lose, especially when an agent uses multiple tools or delegates work to sub-agents.

What changed in AIClaw

The recent attachment work is visible across the backend and frontend.

On the execution side, AIClaw now:

collects file outputs returned by tools
scans sandbox directories for newly created files when a tool does not explicitly return a file result
pulls generated files back out of sub_agent results
deduplicates attachments before the final response is stored or streamed

You can see that in the tool execution path:

The final assistant response now appends an attachment section with Markdown links such as:

Attachment List:
- [report.csv](/public/files/<uuid>)
- [chart.png](/public/files/<uuid>)

In the current Chinese codebase revision, that section is rendered as 附件列表 in the saved final content.

The response payload also includes files directly:

blocking chat responses now return files
streaming done chunks now include files

That path is visible in:

On the frontend, the chat view turns those file objects into clickable downloads through /public/files/${file.uuid}:

/Users/yu/go/src/github.com/chowyu12/aiclaw/web/src/views/chat/Index.vue

Why this matters in practice

This feature is useful anywhere an AIClaw agent produces artifacts instead of pure text.

Examples:

A code_interpreter task generates a CSV and a PNG chart.
A browser automation flow saves a screenshot or extracted document.
A sub-agent writes a research summary file and passes it back to the parent run.
A shell command creates a log bundle or transformed dataset in the sandbox.

Before this improvement, users could still end up asking, “Where did the file go?”

Now the expected workflow is much cleaner:

The tool runs.
AIClaw captures the output files.
The final answer includes explicit download links.
The API and streamed completion both carry the file metadata.
The UI renders the attachments as part of the conversation result.

That means AIClaw behaves more like a practical work system and less like a text-only chatbot that happens to call tools.

A small detail that matters: sub-agent outputs

One useful part of this change is that file outputs are not limited to the top-level agent.

The executor now extracts file references from sub_agent results and folds them back into the parent response. That matters because many real AIClaw workflows split research, scraping, or data prep into delegated tasks. If child artifacts disappear at the parent boundary, the system feels unreliable.

Bringing those files back into the parent answer makes nested agent execution much easier to trust.

Another practical improvement: better streaming behavior

The same change set also adds SSE ping support in chat streaming handlers. That is separate from attachments, but it helps long-running tool workflows stay alive while the agent is still working.

For attachment-heavy runs, that pairing is useful:

streaming stays active during long tool work
the final done chunk can carry the generated files

Why I picked this feature

This is a good example of AIClaw’s local-first, tool-oriented design philosophy. The platform is not just trying to produce a nice paragraph. It is trying to complete work and return the artifacts that work produces.

Generated files are often the real output. Making them first-class in the response path is the right move.

If you are building with AIClaw, this is the kind of feature that improves daily usability more than another abstract prompt tweak.

AIClaw Separates Persistent Memory From Session Search

chowyu — Thu, 25 Jun 2026 11:55:44 +0000

One of the fastest ways to make an agent messy is to treat every kind of memory as the same thing.

User preferences, project conventions, environment facts, old debugging sessions, and last week's conversation snippets do not belong in one undifferentiated bucket. AIClaw's current design avoids that by splitting long-term memory into two explicit mechanisms:

persistent memory for durable facts
session search for historical conversation recall

That split is worth looking at because it solves real operating problems for self-hosted agents.

The design goal

There are three failure modes that show up quickly in agent systems:

everything is stuffed into the prompt, so context grows until it becomes noisy and expensive
everything is hidden in a database blob, so operators cannot inspect or edit it cleanly
old conversations are either forgotten completely or reloaded too aggressively

AIClaw addresses those problems with two different tools that do two different jobs.

Persistent memory is stored as editable files

The repository documents persistent memory in README.md and implements it in internal/tools/memorytool/memory.go.

Instead of hiding long-term memory in opaque storage, AIClaw keeps it in plain text files:

MEMORY.md for durable agent notes, environment facts, and conventions
USER.md for user preferences and communication style

This is a practical choice. Operators can inspect, review, and edit those files directly when needed, which is much easier than debugging memory behavior through serialized blobs.

The current session gets a frozen snapshot

AIClaw does not continuously mutate the active system prompt every time memory changes.

At session start, it loads a snapshot of MEMORY.md and USER.md and injects that into the prompt. The loader in internal/agent/prompt_loaders.go calls memorytool.LoadSnapshot(...) and joins the resulting memory blocks into the runtime prompt.

That means memory writes during a session affect future sessions, not the already-running one.

This is a strong constraint, and I think it is the right one. It keeps prompt behavior stable inside a run and avoids a hard-to-debug class of issues where the agent silently changes its own instructions halfway through execution.

Memory growth is bounded

The memory tool also has explicit limits:

MEMORY.md is capped at 2200 characters
USER.md is capped at 1375 characters

When usage gets high enough, AIClaw switches the injected snapshot into an index mode. Instead of pushing full entry bodies into the prompt, it injects a compact list of entry IDs, tags, and short summaries.

If the agent later needs one of those full entries, it can fetch it explicitly with memory(action=recall, ids=[...]).

That is a smart tradeoff. Long-term memory stays available, but the prompt does not have to pay the full token cost every time.

Memory writes are treated as a security surface

This part is easy to miss, but important.

The memory tool scans content before saving it. The implementation rejects suspicious injection-style patterns and invisible Unicode characters that could be used to smuggle prompt instructions into future sessions.

That makes sense because memory is not just stored data in AIClaw. It is future system-prompt material.

Session search solves a different problem

Persistent memory is for stable facts. Historical conversation lookup is different.

Sometimes the agent does not need a durable note. It needs to answer questions like:

What did we discuss in the earlier deployment thread?
Which error message showed up last time?
What did the user say about this workflow in a prior session?

That is what internal/tools/sessionsearch/session_search.go is for.

The tool has two modes:

no query: return recent conversations with previews
query provided: search prior messages and return matching snippets

SQLite gets FTS5, other databases still work

The search implementation in internal/store/gormstore/fts.go adds a practical optimization.

When AIClaw runs on SQLite, it initializes an FTS5 virtual table plus triggers to keep the search index synchronized with new and deleted messages. It also backfills existing data on startup.

When FTS5 is unavailable, AIClaw falls back to a SQL LIKE search path.

That is exactly the kind of graceful degradation I want in a local-first system. SQLite gets a fast search path, but the feature does not disappear on other databases.

Why the split matters operationally

This is the real value of the feature:

durable facts go into editable memory files
old conversation details stay in searchable archives
the active prompt remains more compact
and both behaviors are explicit enough to inspect

That is better than trying to solve all recall problems with one giant "memory" abstraction.

A practical workflow

If I were operating AIClaw day to day, I would use it like this:

Save stable rules, preferences, and environment facts with the memory tool.
Use session_search when you need to retrieve something from older conversations.
Let the snapshot and index-mode design keep the active prompt lean.

That is a more durable model for long-running agents than simply loading more chat history and hoping the model figures out what still matters.

AIClaw's memory layer is not flashy, but it is disciplined. For self-hosted agent systems, that discipline is usually what keeps advanced features usable after the first demo.

How AIClaw Extends Agents with Custom Tools and MCP Servers

chowyu — Tue, 23 Jun 2026 11:54:53 +0000

Most agent demos look flexible until you need them to talk to your own systems.

That is where AIClaw takes a practical route. The project ships with a broad built-in toolset, but it also lets you extend agents through custom HTTP tools, script tools, and MCP servers without changing the core runtime.

This is not a new feature announcement. It is an existing part of the current repository that is worth a deeper look because it changes what an AIClaw deployment can actually do in day-to-day work.

The problem: built-ins are not enough

Built-in tools cover the common cases well: files, shell, browser automation, web search, web fetch, memory, session search, scheduled jobs, code interpretation, and sub-agents.

But real deployments usually need one more layer:

Call an internal HTTP API.
Wrap a small script around an existing operational workflow.
Reuse external MCP tools from another server process.

Without that layer, the agent can reason, but it cannot reach the systems that matter in your environment.

AIClaw's extension model

The current README describes AIClaw's tool system as a combination of:

built-in tools
custom HTTP tools
custom command tools
MCP server tools
skill-defined tools

In the runtime, those paths are normalized into the same execution flow.

internal/agent/tool.go builds tracked tools from the configured definitions. Built-ins get their native handlers, HTTP tools are wrapped with NewHTTPHandler, command tools use NewCommandHandler, script tools use NewScriptHandler, and MCP tools are loaded from connected servers before the model call.

That matters because extension points do not become second-class behavior. They still participate in the same execution loop, step tracking, and agent prompt assembly.

Custom tools from the web console

The current tool management UI gives you a practical authoring flow instead of requiring hand-written runtime config.

On the Tools page you can:

Create a tool with a name, description, timeout, and enabled state.
Choose a handler type.
Define the function schema that the model will see.
Bind the tool to agents that should be allowed to call it.

The form currently exposes two handler types directly in the UI:

HTTP Callback
Script

For HTTP tools, AIClaw lets you define a request URL and method, with {param_name} placeholders that are substituted from model-supplied arguments at runtime.

For script tools, the form supports:

Python
JavaScript
Shell
Go

The same page also lets you describe the function in a structured way or switch to raw JSON mode for the full function definition.

That is a useful design choice. It keeps the operational handler config and the model-visible function schema separate, which makes tools easier to tune without rewriting everything.

MCP servers as first-class runtime inputs

AIClaw also has a dedicated MCP management page.

The current MCP settings flow supports:

stdio transport
sse transport
endpoint configuration
argument lists
enabled or disabled state

The runtime API behind that page is straightforward: GET /api/v1/runtime/mcp reads the workspace MCP list, and PUT /api/v1/runtime/mcp replaces it.

The runtime side is more interesting.

internal/tools/mcp/client.go connects to each enabled MCP server, initializes the client, lists available tools, and then exposes them to the agent executor. In internal/agent/executor.go, MCP tools are connected before execution so they can be merged into the available tool set for the request.

This gives AIClaw two useful extension paths:

create narrow custom tools inside AIClaw when the integration is simple
mount an external MCP server when the tool suite already exists elsewhere

Safety and operability details that matter

The implementation is more pragmatic than flashy, and that is a good thing.

A few concrete examples from the current codebase:

HTTP tool calls use a shared HTTP client with connection pooling instead of rebuilding a client on every invocation.
Command tools apply additional safety checks on top of the normal shell safeguards, including blocking patterns like outbound shell access and risky recursive deletes.
Tool calls are wrapped as tracked steps, so custom and MCP-backed calls show up in the same execution timeline as built-ins.
The MCP manager is reused at the executor level instead of being torn down after every request.

These are not marketing details. They are the difference between "extensible in theory" and "usable under load".

A practical workflow

If you want to extend an AIClaw agent today, the workflow is roughly:

Start with the smallest possible integration.
If the system is just one API call, create a custom HTTP tool.
If the integration needs local logic, wrapping, or formatting, use a script tool.
If you already have a larger tool surface implemented elsewhere, register it as an MCP server.
Attach only the needed tools to the target agent.
Inspect the execution log after real runs to verify the tool contract is clear enough for the model.

A good example is an internal support workflow:

one HTTP tool for looking up a ticket by ID
one script tool for normalizing the raw result into a concise summary
one MCP server for a broader internal knowledge or operations toolkit

That combination keeps the default agent prompt small while still giving the agent a path into real systems.

Why this matters

The strongest agent products are not the ones with the longest built-in tool list. They are the ones that let you adapt the tool surface to your own environment without fighting the runtime.

AIClaw's current design gets that mostly right:

built-ins for common work
custom tools for narrow integrations
MCP for external tool ecosystems
shared execution logging so everything stays inspectable

If you are building self-hosted agents, that is the layer that turns a chat UI into an actual operations surface.

Repo: AIClaw on GitHub

AIClaw Adds Configurable Web Search Without Hiding Execution Details

chowyu — Sat, 20 Jun 2026 08:36:03 +0000

Most AI agent products say they can "search the web," but they often collapse several very different behaviors into one checkbox.

In the current AIClaw repository, web search is modeled more explicitly:

some models can use built-in provider-side search
other agents can call an external web_search tool through a configured search engine
both paths are visible in runtime behavior instead of being hidden behind vague magic

That separation matters if you run agents in production and want control over cost, provider choice, prompts, and auditability.

What changed in AIClaw

Recent AIClaw changes added a full search engine configuration surface and the agent-side wiring needed to use it:

a new Search Engine page in the web console
persisted search engine configs in the backend
external search support for Tavily, SerpAPI, and Aliyun IQS
agent settings for builtin versus external web search mode
runtime handling that enables model-native search only when that mode is selected
a follow-up UI fix that turns web search on automatically when an external engine is chosen

This is not just a README claim. The repository now includes:

internal/handler/search_engine.go for CRUD and test endpoints
internal/tools/websearch/search.go for provider-specific search execution
web/src/views/search-engine/Index.vue for the console UI
web/src/views/agent/Form.vue for per-agent mode and engine selection
tests covering built-in and external search behavior

Why two search modes are better than one

AIClaw now documents two distinct modes in README.md:

built-in mode: AIClaw sets extra_body: {"enable_search": true} for models that support provider-native search
external mode: AIClaw exposes a web_search tool and routes calls through a saved search engine config

That design solves a real operational problem.

Built-in model search is convenient when the provider already supports it well. In AIClaw, the prompt layer explicitly tells the model that recent and time-sensitive questions can use built-in search. The runtime also records a web_search execution step that shows the request configuration used for that model call.

External search is different. Sometimes you do not want your search behavior tied to one model vendor. Sometimes you want to standardize on a search provider across multiple model backends. Sometimes you want to rotate providers, test them independently, or keep search visible as a normal tool call with structured input and output.

AIClaw supports that split instead of forcing one compromise.

How the flow works

The user workflow is straightforward:

Open the Search Engine page in the AIClaw admin console.
Create one or more search configs.
Pick the provider: Tavily, SerpAPI, or Aliyun IQS.
Save the API key and optional base URL.
Test the config before using it live.
Open an agent.
Enable web search.
Choose external mode if you want tool-based search.
Select one enabled search engine for that agent.

On the backend, AIClaw validates that external mode is only accepted when a real enabled search engine is selected. That check lives in validateExternalWebSearch inside internal/handler/agent.go.

The frontend follow-up fix is also important. In web/src/views/agent/Form.vue, selecting external mode now auto-selects the first enabled engine when possible, and selecting an engine automatically enables web search. That sounds small, but it removes a common configuration footgun: "I picked an engine, why is search still off?"

Provider behavior is implemented, not hand-waved

AIClaw does not treat external search as a generic proxy blob. The repository includes explicit provider logic:

Tavily uses https://api.tavily.com/search
SerpAPI uses https://serpapi.com/search.json
Aliyun IQS uses https://cloud-iqs.aliyuncs.com/search/unified

The tool normalizes limits, validates config state, trims oversized snippets, and returns structured results with title, URL, and snippet fields.

That means the agent can use web results in a predictable format regardless of provider, while operators still choose the engine that fits their environment.

The observability part is the real strength

My favorite part is that AIClaw keeps search behavior observable.

For built-in search, the runtime records that model-native search was enabled and stores the request configuration in a web_search step.

For external search, the agent uses the regular web_search tool path, so tool input, output, duration, and errors stay visible in chat progress and execution logs.

This fits the broader AIClaw design direction: runtime plan state, generated files, execution steps, memory, and now search behavior are treated as first-class runtime objects instead of hidden implementation details.

If you are operating agents for real work, that is a better tradeoff than a black-box "browse the web" toggle.

A practical example

Imagine two agents:

a news-monitoring agent using a model with strong built-in search support
a research or compliance agent that must use a specific external search provider

In AIClaw, those do not need the same search path.

The first agent can stay in built-in mode for a tighter provider-integrated experience.

The second agent can switch to external mode, bind to a chosen engine, and keep the search action visible as a tool call with inspectable results.

That is a more production-friendly model than pretending every search use case is identical.

Why this feature stood out

I picked this topic because it is a concrete feature with recent implementation depth across backend, runtime, tests, and UI:

feature commit: configurable search engines
follow-up fix: external search becomes active as soon as an engine is selected
documented behavior in the README
explicit tests for built-in versus external execution

That combination makes it a good example of how AIClaw is evolving: not just adding another capability, but making the capability configurable and inspectable.

AIClaw is open source and self-hosted, so these details matter. When you control the stack, you also need to control how the agent reaches the outside world.

How AIClaw Compresses Long Agent Conversations Without Losing the Important Parts

chowyu — Fri, 19 Jun 2026 08:57:54 +0000

Long-running agent sessions eventually hit the same problem: the model keeps accumulating chat history, tool outputs, intermediate decisions, and execution traces until the prompt becomes expensive or unstable. AIClaw has a built-in answer for that problem. It does not simply drop old messages. It compresses the middle of the conversation into a structured summary and keeps the parts that still matter for the next step.

This is not a new release post. It is a deeper look at one existing AIClaw runtime feature: context compression.

The Problem

AIClaw is designed for tool-using work, not short chatbot replies. A single task can include:

multiple rounds of shell or browser tool calls
long tool outputs
plan-state progress updates
follow-up fixes after the first attempt
sub-agent results flowing back into the parent run

That is useful context, but it also means the prompt grows fast. If the runtime sends everything back to the model forever, cost increases and the model starts paying attention to the wrong parts of the history.

The README describes this capability briefly as:

Runtime compression: Long middle context can be summarized during execution.

The implementation behind that line is more specific than it sounds.

When AIClaw Decides To Compress

The decision lives in internal/agent/context_compressor.go and is wired into the main execution loop in internal/agent/run.go.

Before each LLM round, AIClaw checks whether the current prompt is too large relative to the model context window.

The current defaults are straightforward:

compress when prompt usage reaches 50% of the model context window
keep the system message at the head
keep at least the latest 20 messages at the tail
require at least 5 middle messages before compression is worth doing

If the model provider reports real prompt-token usage, AIClaw uses that. Otherwise it falls back to an internal estimate. That matters because the trigger is based on actual prompt pressure, not just message count.

What Gets Compressed, And What Stays Intact

AIClaw uses a four-phase flow.

1. Prune old tool output first

Before asking the model to summarize, AIClaw trims older tool messages outside the protected tail window. Tool outputs in that middle region are truncated to 200 runes. That keeps huge logs from dominating the summary prompt.

This is an important design choice. The runtime does not try to summarize raw noise at full size first. It reduces obviously low-value bulk before paying for the summarization call.

2. Protect the head and the tail

The compressor preserves:

the head of the conversation, especially the system prompt
the latest tail of the conversation, where the current working state lives

The part in the middle becomes the candidate for compression.

3. Ask an LLM for a structured summary

Instead of generating a vague paragraph, AIClaw asks for a strict template with sections like:

Goal
Constraints And Preferences
Progress
Key Decisions
Relevant Files
Next Steps
Critical Context

This is a practical choice for agent continuity. The summary is meant to preserve execution state, not produce pretty prose.

4. Rebuild the conversation with a summary message

After summarization, AIClaw inserts a [Context Compression Summary] message and appends a note to the system prompt that earlier conversation has been compressed.

The result is smaller than the original history, but still carries forward the task objective, decisions, blockers, touched files, and next action.

Tool Calls Are Not Split Apart

A subtle detail in the implementation is that AIClaw does not cut through an assistant/tool-call group. The compressor aligns the preserved tail boundary backward so a tool call and its tool results stay together.

That matters because broken tool-call sequences are confusing for the next model round. If an assistant message says it called a tool but the corresponding tool results are missing from the preserved tail, the reconstructed context becomes misleading.

There are tests for this behavior in internal/agent/context_compressor_test.go.

Compression Is Iterative, Not One-Shot

AIClaw also keeps the previous compression summary in memory during the active run. On the next compression pass, it does not start from zero. It sends:

the previous summary
the newly accumulated conversation slice

Then it asks the model to merge them into an updated structured summary.

This makes repeated compression cheaper and more stable in long tasks. Instead of re-summarizing the entire old middle history every time, AIClaw incrementally rolls forward the important state.

Which Model Handles Compression

The main execution loop prefers the agent's FastModelName for compression when one is configured; otherwise it falls back to the primary model.

That is a good default for a local-first agent platform:

the expensive or premium model stays focused on the real task
the cheaper or faster model can handle summarization work
prompt size stays under control during long sessions

A Practical Example

Imagine a debugging session where an AIClaw agent:

reads several Go files
runs tests
inspects logs
edits code
reruns tests
asks a sub-agent to inspect a failing subsystem
returns to the parent run for the final fix

Without compression, the conversation history gradually becomes a pile of stale tool output. With compression, AIClaw can keep the current tail intact while rolling earlier work into a structured checkpoint that still remembers:

which files were already inspected
which commands succeeded or failed
what the user asked for
which constraints matter
what remains unresolved

That is the difference between “shorter prompt” and “runtime continuity.”

Why This Feature Matters

AIClaw is opinionated about execution state. It already treats plan state, generated files, execution steps, memory, and conversation history as first-class runtime data. Context compression fits the same design philosophy.

The goal is not to make the transcript prettier. The goal is to keep an agent useful after a long stretch of real work.

If you are building agents that mostly answer in one turn, this feature is easy to ignore. If you are building agents that browse, edit, run commands, and recover from failure across many rounds, it becomes part of the reliability story.

AIClaw keeps that logic in the runtime rather than pushing the entire burden onto prompt engineering.

Where To Look In The Code

internal/agent/context_compressor.go: compression thresholds, protected windows, summary prompt, iterative summary logic
internal/agent/run.go: where compression is triggered in the execution loop
internal/agent/context_compressor_test.go: tests for summary injection, iterative updates, tool-group preservation, and duplicate-note prevention
README.md: product-level runtime compression description

AIClaw is open source, self-hosted, and built for agents that do more than chat. Context compression is one of the small runtime details that makes that practical over longer sessions.

How AIClaw Splits Main and Fast Models for Sub-Agent Work

chowyu — Thu, 18 Jun 2026 11:04:09 +0000

AIClaw is not just a chat wrapper around one model. In the current repository, it lets you configure multiple providers, choose a primary model per agent, and optionally assign a separate fast model for lightweight sub-agent work.

That matters because real agent runs are uneven. Some steps need a stronger model for planning or synthesis. Other steps are small delegated tasks: inspect a few files, try a shell command, summarize one branch of research, or explore a URL. Running every delegated step on the same expensive model is the simple design, but usually not the practical one.

The Problem

When an agent can call sub_agent, the main task and the delegated task often have different cost and latency requirements.

The parent agent may need the best reasoning model you have.
The child task may only need a cheaper and faster model.
You still want both to stay inside the same agent definition and provider setup.

AIClaw addresses that by separating:

the primary model used by the agent, and
an optional fast model reserved for lightweight sub-agent execution.

The repository states this directly in the README:

Each provider can define its base URL, API key, and model list. Agents choose a provider model and can optionally define a fast model for lightweight sub-agent tasks.

How The Provider Layer Is Structured

In the current codebase, AIClaw supports multiple provider types, including:

OpenAI
OpenAI-compatible APIs
Qwen
Kimi / Moonshot
OpenRouter
Claude
Gemini

Those provider definitions live in the backend model layer, and each provider stores:

a name
a provider type
a base URL
an API key
a model list

This is not a hardcoded single-vendor path. It is meant to let one deployment route different agents through different model backends while preserving a unified agent workflow.

What The Web Console Exposes

The admin console has a dedicated Providers page for managing model backends, and the Agent form consumes those provider definitions.

Two parts of the UI are especially useful here:

The Providers page can fetch remote model lists from the configured provider endpoint.
The Agent form lets you choose both a primary model and an optional fast model from the same provider.

The current provider UI merges remote and local model lists, and the agent editor exposes:

主模型 / main model
快速模型 / fast model

The tooltip for the fast model is explicit: it is the lightweight model used when a sub-agent is invoked with model=fast, and if left empty it falls back to the main model.

How The Runtime Uses The Fast Model

The interesting part is that this is not only a configuration field. The runtime actually switches models during delegated execution.

In internal/agent/subagent.go, AIClaw checks the sub-agent model hint before running the delegated task:

if modelHint == "fast" && ec.ag.FastModelName != "" {
    ec.ag.ModelName = ec.ag.FastModelName
}

So the parent agent can keep its primary model, while the child execution swaps to the fast model only for that delegated run.

That keeps the behavior predictable:

no extra provider objects are needed just to optimize sub-agent cost
the parent and child stay within the same agent configuration
the optimization is explicit instead of hidden behind heuristics

Why This Design Is Practical

This design works well for common agent patterns:

Deep reasoning in the parent, fast exploration in children
Expensive final synthesis, cheap parallel research branches
One provider account, multiple model tiers

For example:

Set the main model to gpt-4.1, claude-sonnet, or another stronger general model.
Set the fast model to something like gpt-4o-mini or another lower-latency option from the same provider.
Let delegated sub_agent tasks use model=fast when they are mostly collecting facts or doing narrow execution.

That gives you a more controllable tradeoff than “always use the big model” or “force every agent to be cheap.”

Why I Picked This Feature

Today’s article is not based on a brand-new release commit. Instead, it is a deeper look at an existing AIClaw capability that is already concrete in the repository and easy to miss if you only skim the README.

Recent drafts already covered runtime plan state, execution logs, skills, and channel session continuity. This provider-and-fast-model path is a different product surface and a useful one if you are designing multi-step agents that need cost control without losing structure.

Practical Workflow In AIClaw

A clean setup looks like this:

Add a provider in the Providers page with base URL, API key, and model list.
Fetch remote models from that provider when available.
Create or edit an agent.
Pick the main model for the parent task.
Optionally pick a fast model for lightweight sub-agent tasks.
Let the agent delegate narrow work through sub_agent.

Because AIClaw keeps sub-agent traces visible in the parent timeline, this remains observable instead of becoming a black box.

Closing

Many agent products talk about orchestration, but the useful details are usually in the runtime tradeoffs. AIClaw’s provider setup plus fast-model override is a good example of a small design choice that improves real-world agent operation: better latency and cost control without splitting your workflow across multiple disconnected agents.

If you are building with AIClaw, this is one of the settings worth using early instead of treating every subtask like it deserves your heaviest model.

How AIClaw Keeps Messaging-Channel Chats Stateful with `/new` and `/continue`

chowyu — Wed, 17 Jun 2026 11:54:21 +0000

External chat channels are convenient for AI agents, but they create a session problem fast.

In a web UI, users can see a sidebar of conversations and click back into old work. In WeCom, Feishu, Telegram, or WhatsApp, that structure usually does not exist. You get a thread, a sender, and a stream of incoming messages. If the agent cannot map that external thread back to an internal conversation reliably, the result is either context loss or messy session sprawl.

AIClaw solves that with a channel bridge that does more than relay messages. It binds external thread keys to internal conversation UUIDs, keeps archive-backed history available, and exposes a few small slash commands that let users control session continuity from inside the channel itself.

The practical problem

A channel integration usually has three jobs:

accept inbound messages from an external system
route them to the right agent
send the reply back to the same place

That is enough for basic request/response behavior, but not enough for long-running agent work.

Real usage needs session control:

start a fresh conversation without deleting the old one
continue an earlier conversation from inside the channel
keep one external thread attached to one internal conversation
avoid duplicate conversations when the first inbound messages arrive concurrently

AIClaw handles those cases in its channel runtime instead of pushing the problem into prompts.

The core design: thread binding

When a channel message arrives, AIClaw derives one or more lookup keys from the inbound event. That can include the native thread key, alias keys, or the sender ID as a fallback.

Those keys are stored in a channel_threads mapping table that links:

channel_id
thread_key
conversation_uuid

If a mapping already exists, AIClaw reuses the conversation. If not, it creates a new conversation and binds every relevant lookup key to it.

That sounds simple, but it matters a lot operationally:

the same external thread stays attached to the same internal conversation
aliases can converge onto the same conversation
channel messages become inspectable in the normal AIClaw conversation model

There is also a concurrency guard. AIClaw uses a singleflight gate for the same (channel, thread) key set so that concurrent first messages do not create duplicate conversations or duplicate bindings.

Slash commands that work inside the channel

The interesting part is that AIClaw does not make users jump back to the admin panel just to manage sessions.

It intercepts a few slash commands before the LLM call:

/new
/reset
/continue
/continue N
/archives
/help

That gives channel users lightweight session control with no extra UI.

`/new`

/new creates a fresh conversation for the current channel thread.

If the thread was already bound to an older conversation, AIClaw removes that binding, creates a new conversation record, and rebinds the current thread keys to the new conversation. The older conversation is not deleted. It can still be recovered later.

This is the right behavior for channel-based agent usage because “start over” should not destroy prior work.

`/continue`

/continue lists recent archived conversations for the same agent and channel user. The archive list is scoped by user identity, so one person does not accidentally browse another person’s channel history.

/continue N switches the current thread binding to the selected archived conversation. After that, the next message in the same external thread continues that older context directly.

This is the part I like most: AIClaw turns a plain messaging thread into a recoverable working session without inventing a separate UI surface.

Why archives matter here

The /continue workflow depends on session archives instead of raw database rows alone.

AIClaw periodically regenerates a Markdown archive for a conversation after enough messages accumulate. The archive is generated without another model call and summarizes things such as:

user goals
tool usage counts
recent assistant decisions

Those archive files live under the agent workspace and are sorted by update time when AIClaw builds the /continue list.

That design gives channel users a workable “recent sessions” experience while keeping the implementation cheap and inspectable.

A realistic channel workflow

Here is a simple example:

A WeCom user starts a deployment investigation in one channel thread.
AIClaw binds that thread to a conversation and keeps all later messages in the same context.
After the task is done, the user sends /new to start a clean debugging session without mixing contexts.
A day later, the user wants the original deployment investigation back.
They send /continue, pick the archived session number, and the same thread is rebound to that old conversation.
The next message continues from the earlier session rather than starting cold.

That is a small feature on paper, but it removes a lot of friction from channel-native agent usage.

Why I think this design is good

There are a few design choices worth calling out:

session control is implemented in the runtime, not hidden in prompts
channel threads map onto first-class conversation records
archives are cheap to generate and easy to inspect
user-facing commands are small, memorable, and channel-friendly

Most importantly, the system acknowledges that messaging channels are not just notification sinks. For many teams, they are the primary operating surface.

If an agent platform wants to work there, it needs explicit session semantics, not only transport adapters.

AIClaw’s channel bridge takes that seriously.

AIClaw is an open-source, self-hosted AI agent platform written in Go with a Vue admin console. It supports built-in tools, custom tools, MCP servers, multi-provider models, runtime planning, sub-agents, persistent memory, execution logs, and messaging-channel integrations in one deployable binary.

DEV Community: chowyu

How AIClaw Adds Durable Memory Without Turning Prompt Context Into a Global Text File

The problem with file-style memory

What changed in AIClaw

Memory is reviewable, not magical

Retrieval is scoped before it reaches the prompt

Evidence and auditability are first-class

The tests cover the failure modes you actually worry about

What this looks like in practice

Why I picked this feature for today's article

How AIClaw's Code Interpreter Turns Scripts Into Downloadable Agent Outputs

The problem: agents need computation, not just text

What the tool actually does

Why writing a real file is the important design choice

How downloadable outputs are recovered

Safety and limits

Where this fits in real AIClaw workflows

A concrete example

Why this feature is worth covering now

How AIClaw Hardens Local Agent Runtimes on Your Machine

The problem: service PATHs are usually worse than your terminal PATH

What changed in the code

Why this is better than a shell wrapper

A small but important macOS improvement

What the tests verify

What this means for AIClaw users

Why I think this feature matters

How AIClaw's Harness Runtime Stops Premature "Done" Answers

The Problem: LLM Confidence Is Not Execution Truth

The Runtime Model

What The Contract Captures

What Counts As Evidence

The Four Validation Gates

What Gets Rejected Before Final Answer

An Explicit finish Tool Instead Of Implicit Guessing

Plan State And Harness Runtime Work Together

Better Failure Semantics

Observable By Design

Why This Matters In Daily Use

A Small But Important Shift

How AIClaw's Harness Runtime Stops Agents From Pretending They're Done

The Problem

The Core Model

Four Validation Gates

What Counts as a Real Final Answer

Corrections Instead of Silent Failure

Better Alignment With Plan State

A New finish Tool

Why This Matters In Practice

Where To Look In The Repo

AIClaw Now Returns Tool Output Attachments You Can Actually Download

The problem

What changed in AIClaw

Why this matters in practice

A small detail that matters: sub-agent outputs

Another practical improvement: better streaming behavior

Why I picked this feature

AIClaw Separates Persistent Memory From Session Search

The design goal

Persistent memory is stored as editable files

The current session gets a frozen snapshot

Memory growth is bounded

Memory writes are treated as a security surface

Session search solves a different problem

SQLite gets FTS5, other databases still work

Why the split matters operationally

A practical workflow

How AIClaw Extends Agents with Custom Tools and MCP Servers

The problem: built-ins are not enough

AIClaw's extension model

Custom tools from the web console

MCP servers as first-class runtime inputs

Safety and operability details that matter

A practical workflow

Why this matters

AIClaw Adds Configurable Web Search Without Hiding Execution Details

What changed in AIClaw

Why two search modes are better than one

How the flow works

Provider behavior is implemented, not hand-waved

An Explicit `finish` Tool Instead Of Implicit Guessing

A New `finish` Tool

`/new`

`/continue`