A comprehensive, actionable guide to the principles, techniques, and architecture behind sipeed/picoclaw β written so you can build a similar system from scratch.
Table of Contents
- π§© What PicoClaw Is and Why It Matters
- π― Design Philosophy
- ποΈ High-Level Architecture
- π Core Concept #1 β The Agent Loop & Pipeline
- πΉοΈ Core Concept #2 β Steering (Mid-Loop Message Injection)
- π€ Core Concept #3 β SubTurn (Hierarchical Sub-Agents)
- πΎ Core Concept #4 β Sessions & JSONL Persistence
- π§ Core Concept #5 β Rule-Based Model Routing
- πͺ Core Concept #6 β The Hook System
- π‘ Core Concept #7 β Channel Abstraction (18+ chat platforms)
- π€ Core Concept #8 β Provider Abstraction (30+ LLMs)
- π οΈ Core Concept #9 β Tools, Skills, and MCP
- β‘ Resource-Efficiency Techniques (the <10MB secret)
- π¦ Cross-Compilation & Single-Binary Deployment
- βοΈ Reference Configuration Schema
- πΊοΈ Step-by-Step: Build Your Own PicoClaw-Style Agent
- β οΈ Common Pitfalls & Lessons Learned
- π Recommended Reading Path Through the PicoClaw Source
1. π§© What PicoClaw Is and Why It Matters
PicoClaw is a single-binary, Go-based personal AI agent that runs in under 10 MB of RAM on $10-class hardware (RISC-V SBCs, Raspberry Pi Zero, MIPS routers, Android via Termux, even old NanoKVM boards). It is heavily inspired by NanoBot, but rewritten "self-bootstrapped" in Go, with ~95% of the code generated by an agent under human review.
What makes it remarkable is not that it talks to LLMs β that's easy β but that it does so while being:
| Property | PicoClaw | Typical Python AI stack |
|---|---|---|
| Memory footprint | < 10 MB | 200 MB β 2 GB |
| Boot time | < 1 s on 0.6 GHz CPU | 5β30 s |
| Distribution | One static binary | venv + dozens of wheels |
| Architectures | x86_64, ARM, ARM64, RISC-V, MIPS, LoongArch | mostly x86_64/ARM64 |
| Channels | 18+ (Telegram, Discord, WeChat, Slackβ¦) | 1β2 typically |
| LLM providers | 30+ via unified interface | 1β3 SDK-locked |
The product is not "a chatbot." It is a portable agent runtime with first-class support for tools, MCP, sub-agents, multi-channel messaging, and provider routing.
2. π― Design Philosophy
These are the principles that drive every design decision. Internalize these first; the code will then make sense.
2.1 πͺΆ Lean by default, extensible by interface
Choose Go because it produces small, statically-linked binaries with tiny runtime overhead, no GIL, and predictable memory. Wrap every variable subsystem (LLM, channel, tool, hook, registry) behind an interface so a feature can be added without touching the core loop.
2.2 π¦ One binary, every architecture
A user deploying to a $10 RISC-V board should not have to think about Docker, Python versions, or shared libraries. make build-all produces binaries for Linux/amd64, ARM, ARM64, RISC-V, MIPS LE, LoongArch, Darwin ARM64, Windows, and NetBSD from one tree.
2.3 πΎ Append-first persistence (JSONL)
Sessions and memories are stored as JSON Lines files with a sidecar .meta.json. Append-only is crash-safe, debug-friendly (tail -f), and trivially shippable. Schema migration happens lazily on read.
2.4 ποΈ Promote routing data to first-class fields
Channels do not bury chatId, senderId, and messageId inside generic metadata maps. Those are typed fields on InboundMessage. Routing, sessions, and hooks all rely on this contract.
2.5 π Capabilities are discovered, not hardcoded
Each channel optionally implements MediaSender, TypingCapable, ReactionCapable, MessageEditor, WebhookHandler, HealthChecker. The manager probes via type assertions. Adding a new platform never touches the manager.
2.6 π° Cheap-first, escalate when necessary
A rule-based classifier scores each turn 0..1 (token count, code blocks, recent tool calls, attachments, depth). Below threshold the request goes to a cheap "light" model. Above it, the heavy model. This alone cuts API spend dramatically for chatty workloads.
2.7 ποΈ Observe everything, intercept rarely
Five synchronous hook points (before_llm, after_llm, before_tool, after_tool, approve_tool) are enough. Everything else is read-only event observation through an EventBus. Hooks can be in-process Go code or external processes via JSON-RPC over stdio.
2.8 πΉοΈ The user can change their mind mid-run
Users issue corrections. The agent loop polls a per-session steering queue after every tool call. New messages are injected before the next LLM turn; remaining queued tools are skipped with a "Skipped due to queued user message" result so the model knows what didn't run.
3. ποΈ High-Level Architecture
ββββββββββββββββββββββββββββββββββββββββββββββ
18+ Chat Channels ββΊ β pkg/channels (per-platform sub-packages) β
(Telegram, β β BaseChannel, capability interfaces β
Discord, β¦) β β Manager: rate-limit, split, retry β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββ
β InboundMessage
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β pkg/bus (typed event bus, in/out ctx) β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β pkg/routing β
β β Dispatch: which agent handles this? β
β β Classifier: complexity score 0..1 β
β β Light/Heavy model decision β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β pkg/session β
β β SessionScope (agent/channel/account/dim) β
β β JSONL backend + .meta sidecar β
β β Canonical key sk_v1_<sha256> + aliases β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββ
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββ
β pkg/agent (the loop) β
β β
β pipeline_setup β pipeline_llm β β
β pipeline_execute (tools) β pipeline_finalizeβ
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β steering β β subturn β β hooks β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β
β β² β² β
β β tools β MCP β
βββββββββΌβββββββββββββββββββββββββββΌβββββββββββ
β β
βββββββββ΄βββββββββ βββββββββ΄βββββββββ
β pkg/tools β β pkg/mcp β
β fs / shell / β β isolated β
β hardware / β β command β
β search ... β β transport β
ββββββββββββββββββ ββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββ
β pkg/providers (factory + facades) β
β anthropic / openai_compat / azure / β
β bedrock / oauth / cli ... β
β cooldown Β· ratelimiter Β· fallback Β· β
β error_classifier β
ββββββββββββββββββββββββββββββββββββββββββββββ
Three top-level binaries are produced from cmd/:
-
picoclawβ the agent itself (CLI + headless server) -
picoclaw-launcher-tuiβ terminal UI launcher -
membenchβ internal memory benchmark used to keep the <10MB promise honest
4. π Core Concept #1 β The Agent Loop & Pipeline
The pkg/agent package is where everything converges. The loop is split into four pipeline stages, each in its own file:
| File | Stage | Job |
|---|---|---|
pipeline_setup.go |
Setup | Build prompt, load session history, resolve model, mount hooks |
pipeline_llm.go |
LLM Call | Call provider, stream tokens, parse tool calls and thinking blocks |
pipeline_execute.go |
Tool Execution | Run tool calls (possibly in parallel), enforce approvals, record results |
pipeline_finalize.go |
Finalize | Persist session, emit events, send outbound message, close turn |
Around the pipeline are cross-cutting modules:
-
turn_coord.goβ owns the per-turn state machine, decides light vs. heavy model, chooses provider candidates. -
turn_state.go/turn_context.goβ typed turn-scoped state. -
context_manager.go/context_budget.go/context_usage.goβ keep the message window inside the model's token limit; trim oldest, summarize, or drop based on budget. -
prompt.go/prompt_contributors.go/prompt_turn.goβ composable prompt builders. Each contributor adds a slice (system identity, tool list, memory, time, channel context). -
eventbus.go/events.goβ fan-out of every meaningful event (tool_exec_start,llm_request,turn_finished, β¦) to observers. -
registry.goβ agent registry;definition.godescribes one agent (name, system prompt, tool set, models, light candidates).
Actionable patterns to copy
- Make the loop a strict state machine, not a callback web. Each pipeline file exports a single function that takes and returns a turn state. Easier to test, to add tracing, and to inject hooks.
-
Have the agent definition be plain data. A
Definitionstruct (pkg/agent/definition.go) is a name + system prompt + tool allow-list + provider candidates + light candidates. Loading from YAML/JSON becomes trivial. - Separate "what to send to the LLM" from "how to send it." Prompt contributors build the abstract message list; the provider facade (next section) maps it to vendor-specific JSON.
-
Track usage at the turn level.
context_usage.gokeeps token-in/token-out per turn so you can enforce per-turn budget caps and emit metering events without parsing logs.
5. πΉοΈ Core Concept #2 β Steering (Mid-Loop Message Injection)
"The user can correct the agent at any moment. Make that a first-class concern."
pkg/agent/steering.go (and agent_steering.go) implements a per-session FIFO queue that the loop polls at four checkpoints:
- Loop initialization (before first LLM call)
- After each tool completes
- After each non-tool LLM response
- Before turn finalization
If a queued message exists at any of those points:
- Any remaining tool calls in the current LLM response are skipped, each receiving the synthetic result
"Skipped due to queued user message."so the model still understands what did/didn't run. - The queued message is appended to the conversation as a new
userturn. - The loop re-enters the LLM stage.
Why this matters
- Side-effect safety. A user yelling "don't send that email" actually stops the email if the previous tool was something else.
- Compute savings. A planned batch of three 3β4s tool calls is ~10s of work avoided.
- Model awareness. Skipping is announced via a tool-result message so the model can adapt instead of repeating the same plan.
Modes & limits
agentLoop.SetSteeringMode(agent.SteeringOneAtATime) // default: pop one per check
agentLoop.SetSteeringMode(agent.SteeringAll) // drain whole queue at once
Hard cap: MaxQueueSize = 10 messages per session. Overflow returns an error on manual Steer() and a warning when an inbound channel-bus drain triggers it.
Public API to copy
// External: inject a correction
err := agentLoop.Steer(providers.Message{
Role: "user",
Content: "actually, focus on X instead",
})
// External: nudge an idle session to continue
resp, err := agentLoop.Continue(ctx, sessionKey, channel, chatID)
Implementation notes
- The queue is scoped by canonical session key. Different chats never bleed into each other.
- Media references (
media://...) survive steering β they're resolved in the normal pipeline before the provider call. - Inbound messages for a session that already has an active turn are automatically enqueued as steering rather than starting a competing turn.
6. π€ Core Concept #3 β SubTurn (Hierarchical Sub-Agents)
Sub-agents are isolated nested loops spawned by a parent turn. Defined in pkg/agent/subturn.go.
Properties
| Property | Value |
|---|---|
| Max nesting depth | 3 |
| Max concurrent per parent | 5 (semaphore-guarded, 30s timeout) |
| Default timeout | 5 min (parent and child have independent timeouts) |
| Message buffer | 50 messages per sub-turn (does not contaminate parent history) |
| Result delivery | async via pendingResults channel (16-message buffer) |
| Cancellation | hard abort cascades to children & grandchildren |
Critical: true |
survives parent completion and continues in background |
When the parent polls results
Same checkpoints as steering β before every LLM call, after every tool call, before finalize. This keeps result handling deterministic without polling threads.
Why context derives from context.Background(), not the parent's ctx
So that an independent timeout on a child does not surprise it when the parent finishes early. If you want cascading cancellation for a particular sub-turn, the parent calls cancel() explicitly.
Pattern to copy
// inside parent agent loop
result, err := agent.SpawnSubTurn(ctx, agent.SubTurnSpec{
AgentDef: "researcher",
Goal: "Find primary sources for claim X",
Critical: false,
Timeout: 2 * time.Minute,
MaxHistory: 50,
})
Pitfalls
-
Orphan results. If the parent finishes before the child, the result is dropped (with a telemetry event). Either mark the child
Critical: trueorawaitit explicitly. - Buffer overflow. With 5 concurrent subs and a 16-slot result buffer, bursty completions can overflow β design subs to emit a single final result, not progress updates.
7. πΎ Core Concept #4 β Sessions & JSONL Persistence
pkg/session answers two questions: which messages share a conversation? and how is that conversation stored durably?
7.1 πͺͺ SessionScope β the structured identity of a conversation
type SessionScope struct {
Version string // ScopeVersionV1
AgentID string // routed agent
Channel string // normalized channel name ("telegram")
Account string // bot/account identifier
Dimensions []string // active partition dims, e.g. ["chat"]
Values map[string]string // concrete dim values
}
Default dimension set is ["chat"] β "one shared conversation per chat unless a dispatch rule overrides it." A dispatch rule can promote topic or sender into the dimension set to split or merge conversations.
7.2 π Two key formats
| Format | Example | Purpose |
|---|---|---|
| Canonical | sk_v1_<sha256> |
Stable, opaque, the source of truth |
| Legacy | agent:main:direct:user123 |
Backward compat, resolved transparently |
The JSONL backend resolves legacy aliases to canonical keys during reads and writes β so you can rename schemes without losing history.
7.3 π JSONL on disk
Per session:
-
<key>.jsonlβ oneproviders.Messageper line, append-only. -
<key>.meta.jsonβ{ summary, created_at, updated_at, line_count, skip_offset, scope, aliases }.
Why two files: messages are append-only and crash-safe; metadata is overwritten under a per-shard mutex but small enough that a torn write is recoverable from the JSONL.
"Designed around append-first durability and stale-over-loss recovery."
7.4 π Allocator rules
The allocator turns inbound metadata into scope values:
-
spaceβ<space_type>:<space_id> -
chatβ<chat_type>:<chat_id> -
topicβtopic:<topic_id> -
senderβ canonicalized through identity-link mappings (so that a user's Telegram ID and Slack ID map to the same logical sender)
Special case: Telegram forum topics append /<topic_id> to chat values when topic is not an explicit dimension β preventing topic cross-talk by default.
7.5 β‘ Concurrency
A 64-shard mutex array (hash key β shard) serializes per-session writes without keeping an unbounded mutex map. This is a small but important pattern: lock striping is essentially free and fixes 99% of session-store contention bugs.
7.6 π Migration
On startup the system attempts to migrate legacy JSON sessions into JSONL. If migration fails, it falls back to the legacy SessionManager rather than crash-looping the agent.
Actionable patterns
-
Make session keys content-addressed (
sha256over a canonical scope signature) so renaming dimensions doesn't break history. - Sidecar metadata is far simpler than embedding a header line in the JSONL.
- Lock striping > one big mutex > one mutex per session. 64 shards is a good default.
8. π§ Core Concept #5 β Rule-Based Model Routing
pkg/routing is a two-stage pipeline:
-
Agent dispatch β
Routerpicks which agent definition handles the message (rules over channel, sender, content, command-prefix, etc). -
Model routing β once an agent is chosen, the
RuleClassifierdecides whether to use the agent's primary (heavy) model or a globally-configured cheap light model.
8.1 βοΈ Configuration
{
"routing": {
"enabled": true,
"light_model": "gemini-2.0-flash",
"threshold": 0.35
}
}
8.2 π¬ Features extracted per turn
The classifier is intentionally language-agnostic (no keyword lists), using five structural features:
| Feature | What it measures |
|---|---|
TokenEstimate |
Approximate token count (CJK-aware rune counting) |
CodeBlockCount |
Number of fenced ` blocks in latest message |
RecentToolCalls |
Tool invocations in the last 6 history entries |
ConversationDepth |
Total history length |
HasAttachments |
Media references or recognized file extensions |
8.3 βοΈ Weighted scoring (clamped to [0,1])
| Signal | Weight |
|---|---|
| Has attachments | 1.00 |
| Code block present | 0.40 |
| Tokens > 200 | 0.35 |
| Recent tool calls > 3 | 0.25 |
| Tokens > 50 | 0.15 |
| Recent tool calls 1β3 | 0.10 |
| Conversation depth > 10 | 0.10 |
With threshold 0.35, trivial chat stays cheap; code, attachments, or active tool use trigger heavy. Long plain prompts cross at the 200-token boundary.
8.4 π Where it plugs in
pkg/agent/turn_coord.go swaps the candidate provider list to agent.LightCandidates when score < threshold; otherwise it uses the agent's primary candidate set unchanged. The agent doesn't know β it just receives a different ordered list of providers.
Pattern to copy
-
Routing rules are data, not code. Keep them in JSON. Hot-reload is then
os.Stat+json.Unmarshal. -
Each agent has both
CandidatesandLightCandidatesβ primary and cheap fallback chains. Routing only picks the chain; the fallback logic inside the chain is generic (next section).
9. πͺ Core Concept #6 β The Hook System
Five synchronous hook points + arbitrary read-only observers. Defined in pkg/agent/hooks.go, hook_mount.go, hook_process.go.
9.1 π The five synchronous points
| Stage | Allowed actions |
|---|---|
before_llm |
continue Β· modify (rewrite request) Β· abort_turn Β· hard_abort
|
after_llm |
continue Β· modify (rewrite response) |
before_tool |
continue Β· modify (rewrite args) Β· respond (skip exec, supply result) Β· deny_tool
|
after_tool |
continue Β· modify (rewrite tool result) |
approve_tool |
allow / deny only |
Everything else is observer-only events on the bus.
9.2 π In-process vs out-of-process
In-process: Go function registered at startup. Zero serialization cost. Used for built-ins like rate-limit injectors, audit loggers, schema validators.
Out-of-process: any program speaking JSON-RPC over stdio. Spawned and supervised by HookManager. Use for Python ML reranking, secret scrubbers, external policy engines, even mocking tools during tests.
9.3 π‘ JSON-RPC framing
`json
// Request from host β hook
{ "jsonrpc": "2.0", "id": 7, "method": "hook.before_tool", "params": { ... } }
// Hook β host
{ "jsonrpc": "2.0", "id": 7, "result": { "action": "respond", "result": "cached" } }
// Notification (one-way; observer events)
{ "jsonrpc": "2.0", "method": "hook.event", "params": {"Kind": "tool_exec_start"} }
`
Lifecycle: host calls hook.hello first to negotiate protocol version + capabilities.
9.4 βοΈ Configuration shape
`json
{
"hooks": {
"enabled": true,
"observer_timeout_ms": 200,
"interceptor_timeout_ms": 5000,
"approval_timeout_ms": 30000,
"builtins": {
"audit_log": { "enabled": true, "priority": 10, "config": {} }
},
"processes": {
"policy_check": {
"enabled": true,
"priority": 100,
"transport": "stdio",
"command": ["python3", "/srv/policy.py"],
"env": { "POLICY_FILE": "/etc/policy.yml" },
"observe": ["tool_exec_start"],
"intercept": ["before_tool", "approve_tool"]
}
}
}
}
`
9.5 π Hook ordering
In-process first β then by priority ascending β then by name. Deterministic and easy to reason about.
What hooks are NOT for
- Sending messages to channels themselves (use the bus).
- Suspending a turn pending human approval (state machine externally).
- Full message interception across all platforms (channel-level concern).
Patterns to copy
-
Make the hook protocol versioned (
hook.hello). It saves a major refactor 18 months later. - Observers run with a strict timeout (e.g. 200ms). Slow observers degrade quietly into "skipped" instead of stalling turns.
-
respondaction lets a hook fake tool output. Cache, mock, override β without touching the registry.
10. π‘ Core Concept #7 β Channel Abstraction (18+ chat platforms)
pkg/channels is the textbook example of capability-based polymorphism in Go.
10.1 π The contract
Every platform sub-package embeds BaseChannel (base.go) and implements the minimum interface. Each platform self-registers a factory in init():
`go
func init() {
channels.Register("telegram", New)
}
`
registry.go is the single source of truth; the manager never imports specific platforms.
10.2 π Capability interfaces (optional)
`go
type MediaSender interface { SendMedia(...) error }
type TypingCapable interface { ShowTyping(...) error }
type ReactionCapable interface { React(...) error }
type PlaceholderCapable interface { SendPlaceholder(...) (id string, err error) }
type MessageEditor interface { Edit(...) error }
type WebhookHandler interface { HandleWebhook(http.ResponseWriter, *http.Request) }
type HealthChecker interface { Check(ctx context.Context) error }
`
The manager probes channels with if c, ok := ch.(MediaSender); ok { ... }. Adding VoiceCapable to one platform doesn't change anyone else.
10.3 ποΈ First-class fields, not metadata bags
InboundMessage (in pkg/bus) hoists routing data to typed fields:
`go
type InboundMessage struct {
Peer Peer // platform + chat + topic
MessageID string
Sender SenderInfo // canonical identity ("telegram:42")
Body string
Media []MediaRef
ReceivedAt time.Time
}
`
This is the contract that pkg/session.Allocator and pkg/routing.Router rely on. Put it in your design from day one β retrofitting is painful.
10.4 ποΈ Centralized orchestration in the manager
The manager (not the platform) owns:
- Worker queue with rate limit per channel.
-
Outbound message splitting (
split.go) β long replies are broken at sentence/word boundaries below the platform's per-message limit. -
Retries with backoff on transient errors classified by
errors.go/errutil.go. - Typing/reaction indicators as transparent decorations of long turns.
Platforms only know how to send a single chunk. Everything fancy happens above them.
10.5 πͺͺ Identity normalization
pkg/identity defines the canonical "platform:id" format and identity-link tables that collapse multi-platform users into one logical sender. This is what enables cross-channel memory and consistent routing.
Patterns to copy
-
Self-registration via blank-import side effects: the main binary just does
_ "yourapp/channels/telegram"and the channel becomes available. No registry plumbing. - Capability interfaces beat optional methods on a god-interface. You will thank yourself when the 12th platform needs something weird.
-
Sentinel errors in
errors.goso the manager can decide retry vs. drop without parsing strings.
11. π€ Core Concept #8 β Provider Abstraction (30+ LLMs)
pkg/providers is built around a factory + facade pattern.
11.1 π Layout
`plaintext
pkg/providers/
factory.go // registers and instantiates providers by name
factory_provider.go
cli_facade.go // unified for "CLI"-shaped providers
httpapi_facade.go // unified for HTTP-shaped providers
oauth_facade.go // unified for OAuth flows
cooldown.go // per-provider cool-down on auth/quota errors
ratelimiter.go // token-bucket per provider
fallback.go // chain-of-responsibility fallback to next candidate
error_classifier.go // network/auth/rate/server/unknown
types.go // Message, ContentBlock, ToolCall, Usage, β¦
anthropic/ // Anthropic Messages API
anthropic_messages/ // alt path, e.g. server-side tools
openai_compat/ // OpenAI + every API-compatible vendor
openai_responses_common/
azure/ // Azure OpenAI specifics
bedrock/ // AWS Bedrock
httpapi/ // generic HTTP fallback
oauth/ // device flows
cli/ // local CLI providers (Ollama-style)
common/ // shared message-utility helpers
messageutil/
protocoltypes/
`
11.2 π The provider interface (conceptual)
A provider exposes:
-
Send(ctx, request) (response, error)(streaming via channel) -
Capabilities()(tools? vision? thinking? context window? streaming?) -
Name(),Model()
The agent loop never imports a specific provider β it picks one from a candidate list returned by the routing layer.
11.3 π‘οΈ Reliability stack (the part most projects miss)
When a provider call fails, the wrapper consults:
-
error_classifierβ Auth? Rate-limit? Network blip? 5xx? -
cooldownβ if Auth/Quota, mark this provider unavailable for N minutes. -
ratelimiterβ token bucket to keep us under contractual TPM/RPM. -
fallbackβ try next candidate in the chain (heavy β light, or primary β secondary key).
The agent never sees this β it sees one logical "send" that either returns a response or gives up after the chain is exhausted.
Patterns to copy
-
Provider config is
protocol/modelstrings, e.g."openai/gpt-5.4","anthropic/claude-opus-4-7". Swap by editing config; no recompile. -
Keep API keys in a separate
.security.ymlout ofconfig.json. Different file permissions, easier to scrub in bug reports. - The classifier's job is to decide retry-or-not. Don't bake retry into each provider β it'll diverge.
12. π οΈ Core Concept #9 β Tools, Skills, and MCP
Three layers of "things the agent can do beyond LLM calls":
12.1 π§ Tools β built-in, in-process
pkg/tools/:
-
fs/β read, write, list, glob. -
shell.go(+ Unix/Windows variants) β process exec. -
hardware/β device interactions (USB, GPIO, camera; appropriate for SBCs). -
integration/β outbound HTTP, web search (DuckDuckGo, Brave, Tavily, Baidu). -
shared/β shared helpers used by multiple categories. -
registry.goβ registers tools; exposesGet(name),List(), schema. -
toolloop.goβ orchestrates tool execution within a single turn (parallel-safe, with approval hook integration). -
search_tool.goβ first-class tool selector for "find a tool that does X." -
spawn.go/spawn_status.goβ long-running child process management.
12.2 π Skills β installable plugins
pkg/skills/:
-
Two registry backends:
clawhub_registry.go(custom hub),github_registry.go(any repo with the right manifest). -
installer.goβ fetch, verify, materialize on disk. -
loader.goβ load at runtime. -
provider_factory.goβ skills can ship with provider configurations. -
search_cache.goβ registry search results are cached. -
config_bridge.goβ skill config is merged into runtime config without leaking into the parent file.
A skill is essentially a packaged bundle of (tools | hooks | provider configs | prompts | docs) that can be installed by name and removed cleanly.
12.3 π MCP β Model Context Protocol
pkg/mcp/:
-
manager.goβ owns connections to MCP servers, exposes their tools/resources/prompts to the agent. -
isolated_command_transport.goβ spawns each MCP server in an isolated process, talks JSON-RPC over stdio. Prevents one buggy server from crashing the agent. -
manager_test.goβ coverage.
agent_mcp.go (in pkg/agent) wires MCP-discovered tools into the per-turn tool list. From the model's perspective, an MCP tool and a built-in tool are indistinguishable.
Patterns to copy
- Built-in tools stay tiny and audited. Anything ambitious (browser automation, payments) lives behind MCP or skills.
- MCP transport isolation is non-negotiable. Treat MCP servers as untrusted child processes.
- Tools have schema, descriptions, and approval flags as data, not Go conditionals. Re-using the tool registry for skills and MCP just becomes a matter of listing them.
13. β‘ Resource-Efficiency Techniques (the <10MB secret)
Hitting <10 MB on a 0.6 GHz RISC-V is engineering, not magic. The techniques used:
13.1 πΉ Choice of Go
- Static linking: no shared-library footprint.
- No JIT/interpreter. No Python startup cost.
-
-ldflags="-s -w"strips the symbol table and DWARF info from the binary (~30% size reduction). -
-trimpathremoves file system paths. - UPX (optional) for additional compression on flash-poor boards.
13.2 π§΅ Minimal goroutine surface
A typical concurrent system spawns thousands of goroutines. PicoClaw keeps it tight: one per active channel listener, one per active turn, one per running sub-turn (capped at 5ΓN), one per spawned hook process, one per MCP transport. Goroutines are cheap but each carries a stack β keep them counted.
13.3 π§ Bounded queues everywhere
- Steering queue: 10
- SubTurn result buffer: 16
- Concurrent SubTurns per parent: 5
- Channel manager worker queue: per-platform configured
Bounded queues turn "memory bug" into "rejected request" β you can monitor and tune.
13.4 π Streaming, not buffering
LLM responses are streamed token-by-token. Tool outputs from spawned processes are streamed line-by-line. Big responses never sit fully in memory.
13.5 π JSONL append-only persistence
Constant-memory writes; reads are line-iterators. No O(n) JSON object reload on every turn.
13.6 π΄ Lazy initialization
Channels, hooks, and skill registries initialize only when enabled in config. Disabled subsystems contribute zero allocations.
13.7 π membench as a regression gate
cmd/membench is shipped in the repo: a synthetic workload that measures peak RSS. If a PR busts the budget, CI catches it.
13.8 π§ Architecture-aware patches
For MIPS LE on Ingenic X2600 / NaN2008 kernels, the Makefile patches the ELF e_flags at offset 36 after building. Without this, the kernel rejects the binary. Lesson: cross-compilation is not done when the linker exits.
14. π¦ Cross-Compilation & Single-Binary Deployment
14.1 π¨ The build matrix (make build-all)
| OS | GOARCH | Notes |
|---|---|---|
| linux | amd64 | |
| linux | arm (GOARM=7) |
Pi Zero 2 W (32-bit) |
| linux | arm64 | Pi Zero 2 W (64-bit), most modern SBCs |
| linux | riscv64 | LicheeRV-Nano, MaixCAM |
| linux | mipsle | post-build ELF flag patch for NaN2008 kernels |
| linux | loong64 | LoongArch |
| darwin | arm64 | Apple Silicon |
| windows | amd64 | |
| netbsd | amd64 / arm64 |
Specialized targets:
-
build-pi-zeroβ 32-bit + 64-bit Pi Zero 2 W bundle. -
build-android-bundleβ universal APK with JNI libs (the agent runs as a native service inside the APK). -
build-whatsapp-nativeβ adds the native WhatsApp bridge. -
build-launcher/build-launcher-tuiβ web/TUI control panels.
14.2 π·οΈ Version stamping
`shell
go build -ldflags "-s -w \
-X main.version=$(VERSION) \
-X main.commit=$(COMMIT) \
-X main.date=$(DATE)"
`
picoclaw --version then prints the stamped values β vital for triage.
14.3 π Single-binary delivery
The launcher (web or TUI) is a tiny supervisor that:
- Detects platform, picks the right binary.
- Drops it into
~/.picoclaw/. - Spawns it and proxies a local browser to
http://localhost:18800for configuration.
End user double-clicks the launcher; agent runs. No package manager, no Docker, no Python.
15. βοΈ Reference Configuration Schema
Annotated subset of config.example.json:
`jsonc
{
// Default agent settings used when an agent doesn't override.
"defaults": {
"workspace": "~/.picoclaw/workspace",
"model_name": "openai/gpt-5.4",
"max_iterations": 25,
"max_input_tokens": 128000,
"max_output_tokens": 4096
},
// Provider candidates. API keys live in .security.yml, NOT here.
"models": [
{ "name": "openai/gpt-5.4", "endpoint": "https://api.openai.com/v1" },
{ "name": "anthropic/claude-opus-4-7", "endpoint": "https://api.anthropic.com" },
{ "name": "google/gemini-2.0-flash" },
{ "name": "ollama/qwen3", "endpoint": "http://localhost:11434" }
],
// Cheap-first routing.
"routing": {
"enabled": true,
"light_model": "google/gemini-2.0-flash",
"threshold": 0.35
},
// Per-channel config; most disabled by default.
"channels": {
"telegram": { "enabled": false, "token": "" },
"discord": { "enabled": false, "token": "" },
"slack": { "enabled": false, "bot_token": "", "app_token": "" },
"matrix": { "enabled": false },
"wechat": { "enabled": false }
},
// Tool surface.
"tools": {
"web_search": { "enabled": true, "providers": ["duckduckgo", "brave", "tavily"] },
"shell": { "enabled": true, "approval_required": true },
"fs": { "enabled": true, "root": "~/.picoclaw/workspace" },
"cron": { "enabled": true }
},
// External MCP servers, each isolated in its own process.
"mcp": {
"servers": {
"filesystem": { "command": ["mcp-server-fs"], "enabled": true }
}
},
// Skills marketplace.
"skills": {
"registries": {
"clawhub": { "enabled": true, "url": "https://hub.picoclaw.io" },
"github": { "enabled": true }
},
"installed": []
},
// Hooks: in-process built-ins + external processes.
"hooks": {
"enabled": true,
"observer_timeout_ms": 200,
"interceptor_timeout_ms": 5000,
"approval_timeout_ms": 30000,
"builtins": {
"audit_log": { "enabled": true, "priority": 10 }
},
"processes": {}
},
// Heartbeat for liveness reporting and autoscale signals.
"heartbeat": { "interval_seconds": 30 },
// Web UI gateway.
"gateway": { "host": "127.0.0.1", "port": 18800 }
}
`
Companion file:
`yaml
.security.yml -- separate file, separate permissions
openai:
api_key: sk-...
anthropic:
api_key: sk-ant-...
telegram:
token: 1234:ABC...
`
16. πΊοΈ Step-by-Step: Build Your Own PicoClaw-Style Agent
A pragmatic 12-step roadmap. Each step yields a runnable artifact.
Step 1 β 𦴠Skeleton repo
`plaintext
yourapp/
cmd/yourapp/main.go # entry
pkg/
agent/
bus/
channels/
config/
providers/
routing/
session/
tools/
Makefile
config/config.example.json
.security.example.yml
`
main.go reads config, constructs a Manager, blocks on os.Signal. Nothing else yet.
Step 2 β π Typed message bus
Define InboundMessage and OutboundMessage with first-class Peer, Sender, MessageID. Build pkg/bus/bus.go as a fan-out dispatcher with bounded per-subscriber queues.
Step 3 β πΊ One channel: stdin/stdout
Implement a stdio channel that reads lines from stdin, emits InboundMessage, prints OutboundMessage. This is your dev harness β no Telegram tokens needed.
Step 4 β π€ One provider: OpenAI-compatible
Build the openai_compat provider. Make it streaming. Define a Provider interface with Send(ctx, req) (<-chan Chunk, error).
Step 5 β π Minimal agent loop
pkg/agent/pipeline_*.go. Setup β LLM β execute (no tools yet) β finalize. Hardcode a system prompt. End-to-end you should now type "hello" and get a streamed reply.
Step 6 β πΎ Sessions on JSONL
Build pkg/session with canonical keys, JSONL backend, .meta.json sidecar, 64-shard mutex. Now conversation persists across runs.
Step 7 β π οΈ Tools registry
Implement pkg/tools/registry.go with Get, List, Schema(). Add two tools: fs.read and web.fetch. Wire pipeline_execute to call them on parsed tool calls.
Step 8 β πΉοΈ Steering
Add per-session FIFO queue + four polling points. Test by sending a follow-up while the agent is running tools β it must skip remaining tools with the explicit "Skipped" tool result.
Step 9 β πͺ Hooks
Define five hook points + observer events. Build in-process registration first; add JSON-RPC stdio process hooks once the in-process path is solid.
Step 10 β π§ Routing
Add pkg/routing classifier with the five features and weighted scoring. Add light_model to config. Verify cheap chat goes to the light model.
Step 11 β π‘ Second channel + capability interfaces
Add Telegram. Define MediaSender, TypingCapable, WebhookHandler capability interfaces. Move retries / splitting / rate-limit into manager.go. The Telegram channel itself should be ~200 lines.
Step 12 β π¦ Cross-compile & ship
`makefile
build-all:
\tGOOS=linux GOARCH=amd64 go build -ldflags="-s -w -trimpath" -o dist/yourapp-linux-amd64 ./cmd/yourapp
\tGOOS=linux GOARCH=arm GOARM=7 go build -ldflags="-s -w -trimpath" -o dist/yourapp-linux-armv7 ./cmd/yourapp
\tGOOS=linux GOARCH=arm64 go build -ldflags="-s -w -trimpath" -o dist/yourapp-linux-arm64 ./cmd/yourapp
\tGOOS=linux GOARCH=riscv64 go build -ldflags="-s -w -trimpath" -o dist/yourapp-linux-riscv64 ./cmd/yourapp
\tGOOS=linux GOARCH=mipsle GOMIPS=softfloat go build -ldflags="-s -w -trimpath" -o dist/yourapp-linux-mipsle ./cmd/yourapp
\tGOOS=darwin GOARCH=arm64 go build -ldflags="-s -w -trimpath" -o dist/yourapp-darwin-arm64 ./cmd/yourapp
`
Run du -h dist/* β single-digit MB binaries. Confirm with a membench run that peak RSS stays under your target (e.g. 10 MB).
Then add: SubTurns (Step 13), MCP (14), skills marketplace (15), web launcher (16), more channels (17βN).
17. β οΈ Common Pitfalls & Lessons Learned
These are the traps either explicit in PicoClaw's docs or implied by its design choices.
| Pitfall | Mitigation |
|---|---|
| Goroutine leaks via unbounded fan-out | Bounded queues + errgroup per scope (turn, session, channel). |
| Cross-channel memory crosstalk | Canonical session key from sha256(scope) β never concatenate strings. |
| Forum/topic chats merging into one conversation | Append /<topic_id> to chat values when topic isn't an explicit dimension. |
| Tool side effects after a user correction | Skip remaining tools on steering arrival; emit explicit skip results. |
| Orphan SubTurn results crashing parent | 16-slot result buffer + Critical: true for must-finish work. |
context.Background() vs parent ctx confusion |
Document explicitly in your SubTurn API; default to independent timeouts. |
| API keys in plaintext config | Two files: config.json + .security.yml with stricter perms. |
| Memory regressions slipping in | Ship membench and gate it in CI. |
| MIPS LE binaries refused by kernel | Patch ELF e_flags at offset 36 after build. |
| Hooks blocking turns | Per-class timeouts: observer 200ms, interceptor 5s, approval 30s. |
| Rebuilding when adding a provider | Provider config is protocol/model strings; factory dispatches at runtime. |
| Schema drift between sessions | Lazy migration in JSONL backend; never edit applied "migrations" β append new ones. |
| Routing rules buried in code | Routing is data β JSON rules + features. Hot-reload friendly. |
| 30 channels each duplicating retry logic | Centralize retry/split/rate-limit in manager.go; channels send a single chunk. |
| MCP server bug killing the agent | Spawn each MCP server in an isolated process via isolated_command_transport. |
| One mutex around the session store | 64-shard mutex array on hash(key). |
18. π Recommended Reading Path Through the PicoClaw Source
If you read these files in this order, the architecture clicks fast:
-
cmd/picoclaw/main.goβ the boot sequence. -
pkg/bus/types.goβ the typed message contract that flows through the whole system. -
pkg/agent/definition.goβ what an agent is as data. -
pkg/agent/pipeline.goβpipeline_setup.goβpipeline_llm.goβpipeline_execute.goβpipeline_finalize.goβ the loop. -
pkg/agent/turn_coord.goβ the brains tying routing, providers, and steering together. -
pkg/agent/steering.goβ the most copy-worthy single concept in the project. -
pkg/agent/subturn.goβ sub-agent semantics. -
pkg/session/manager.go+jsonl_backend.go+allocator.goβ durable state. -
pkg/routing/router.go+classifier.go+features.goβ cheap-first routing. -
pkg/agent/hooks.go+hook_mount.go+hook_process.goβ extensibility. -
pkg/channels/manager.go+base.go+interfaces.goβ channel abstraction. -
pkg/providers/factory.go+cooldown.go+fallback.go+error_classifier.goβ provider reliability stack. -
pkg/tools/registry.go+toolloop.goβ tool execution. -
pkg/mcp/manager.go+isolated_command_transport.goβ MCP integration. -
pkg/skills/registry.go+installer.goβ plugin marketplace. -
Makefileβ cross-compilation matrix, ELF patching, version stamping. -
docs/architecture/*.mdβ official narrative for steering, subturn, sessions, routing, hooks.
π― TL;DR β The Recipe in One Page
- Use Go. Static binaries, small RSS, uniform across architectures.
-
Typed message bus with first-class
Peer,Sender,MessageID. - Pipelined agent loop: setup β LLM β tools β finalize, with a turn state struct.
- Steering: per-session FIFO queue polled at 4 checkpoints; skipped tools get explicit results.
-
SubTurns with depth β€ 3, concurrency β€ 5, independent timeouts,
Criticalflag for must-finish. -
Sessions: structured
SessionScopeβ canonicalsk_v1_<sha256>key, JSONL +.meta.json, 64-shard locking. -
Routing: classifier with 5 structural features, weighted score,
light_modelbelow threshold. - Hooks: 5 sync points + observer events, in-process or JSON-RPC over stdio, per-class timeouts.
-
Channels: each in its own sub-package, embed
BaseChannel, declare optional capabilities by interface, manager owns retries/splitting/rate-limit. -
Providers: factory + facades + cooldown + ratelimiter + fallback + error_classifier, configured by
protocol/modelstrings, secrets in.security.yml. - Tools / MCP / Skills: in-process tools for built-ins; MCP for untrusted external tools (isolated transport); skills as installable bundles from a registry.
- Bounded queues, streaming, lazy init,
-ldflags="-s -w",-trimpath,membenchregression gate. - Cross-compile to amd64/arm/arm64/riscv64/mipsle + Darwin + Windows + NetBSD; patch MIPS ELF e_flags; ship a launcher that auto-picks the binary.
Build steps 1β12 from Β§16 in order, validate with the patterns in Β§17, and you have a PicoClaw-class agent.
If you found this helpful, let me know by leaving a π or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! π
Top comments (0)