“Six agents” here means one orchestrator (Zoe) plus five specialist agents. Six ACP coding experts run as concurrent implementation workers — not counted in that headline number.
Your Day Has Been Taken Over
Overnight, the trading agent ships the prior US session wrap-up. By morning, the macro analyst has the pre-market brief ready. The butler has pushed weather, schedule, and to-dos. AINews (AI Sentinel) has scanned GitHub Trending, arXiv’s latest papers, and 100+ sources — 18+ curated items ranked by importance. Content (Content Strategist) is tracking trending topics across 50+ platforms.
Here’s what matters most to me — automatic tracking of AI dynamics and tech trends. After discovering valuable projects or papers, the system doesn’t just push news — it evaluates impact on our systems and provides P0/P1/P2 action recommendations. Valuable discoveries enter Zoe’s Tech Radar (Zoe is the CTO Agent), going through evaluation → decision → delegated coding implementation.
60 cron tasks run automatically every day (3 AM backup to 11:45 PM reflection). Agents are evolving on their own — mistakes are remembered, recurrence rates drop significantly. This isn’t rules I wrote — it’s autonomous iteration from .learnings/ to MEMORY.md.
System: 1 orchestrator (Zoe) + 5 specialized agents (AINews, Trading, Macro, Content, Butler) + 6 ACP coding experts + 60 cron tasks + 100+ Skills + ~30 configured model profiles + 23 automatic recoveries in two weeks.
Note: Metrics based on February-March 2026 monitoring. Individual results may vary.
System Architecture
┌─────────────────────────────────────────────┐
│ User (Human) │
│ Requirements + Key Node Approval │
└─────────────────┬───────────────────────────┘
│
┌────────▼────────┐
│ Zoe (CTO) │
│ 3x Daily Check │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌───▼───┐ ┌────▼────┐ ┌───▼───┐
│AINews │ │ Trading │ │ Macro │
└───┬───┘ └────┬────┘ └───┬───┘
│ │ │
└────────────┼────────────┘
│
┌────────▼────────┐
│ Content + Butler│
└────────┬────────┘
│
┌────────▼────────┐
│ Event Bus │
│ + Shared Context│
└────────┬────────┘
│
┌────────▼────────┐
│ ACP Coding │
│ (6 concurrent) │
└─────────────────┘
Key Design Decisions:
Agents Evolving Autonomously
Designed protocols — Zoe diagnosed communication issues, designed three-state protocol (request → confirmed → final, with silent as the default "no news is good news" state), solidified into AGENTS.md
Self-developed Skills — Content researched ways to make drafts sound less generically LLM-written (“de-AI” polish), wrote Skills, published to ClawHub (shared repository)
Strategy roundtables — Macro + Trading produce weekly reports with data snapshots, position recommendations, stop-loss discipline
Task Watcher — Zoe designed cron-level Task Callback Event Bus for async monitoring
My role: Set up framework, establish constraints, confirm direction. Requirement discovery, solution research, protocol design, implementation — all done by agents.
Team: 1+5+6 Formation
Zoe (CTO / Chief Orchestrator)
3 daily inspections (10:00/14:00/22:00 PT): cron execution, disk usage, session health, Chrome DevTools Protocol (CDP) leak checks, .learnings/ pending, shared-context/ timestamps.
Weekly: Analyze each agent’s MEMORY.md, execute layered compression.
Key capability: Solution design — three-state protocol, Task Watcher, Communication Guardrail framework, all designed autonomously.
AINews (AI Sentinel) — Intelligence Hub
Collects from 100+ sources daily: GitHub Trending, arXiv, RSS, HackerNews, Reddit. 7 cron tasks: morning brief (08:30), midday paper (12:00), evening trends (20:00).
Critical capability: Proactive tech impact evaluation. Discovered ReMe framework → proposed to Zoe → I confirmed → agents executed.
Toolchain: github_trending.py, rss_aggregator.py, arxiv_papers.py, Tavily, agent-browser. Anti-hallucination: every item MUST have a URL, reachability self-check, and unverifiable items labeled single-source.
Trading (Quantitative Analyst)
21 cron tasks (densest load). 20 quant tools, 15 Skills (68K+ lines), 65/35 scoring (tool/AI). Covers US stocks + commodities + crypto.
Four-step framework: Macro factors → scoring (technical 25% / flow 30% / fundamentals 10% / sentiment 20% / market 15%) → cross-check (sanity-check vs. macro and flow) → target + score + stop-loss + confidence.
Not financial advice — automated research output only; you are responsible for any real-money decisions.
Hard rules (system policy, not investment advice): no entry without a defined stop, never fabricate data, confidence <60% = “wait.”
Macro (Chief Economist)
9 cron tasks: Morning (07:50) → Midday (12:30) → Evening (18:00) → US pre-market (22:00) → morning digest of the prior US session (05:20 PT) — scheduled after the cash close, not at the closing bell. Sunday weekly review → Trading references for market review.
Discipline: Cite sources, distinguish facts vs judgments, mark confidence (high >70% / medium 50–70% / low <50%), propose counter-arguments.
Real case: Iran tension → traditional: “gold rises” → actual: oil +14%, gold -5%. Macro: “inflation logic dominates, not safe haven.” Saved to MEMORY.md.
Content (Content Strategist)
9 cron tasks: Research (09:00, 50+ platforms) → Ideate (10:30, consume AINews) → Write (14:00, score drafts) → Reflect (22:10).
Autonomous evolution: Discovered content too “AI-flavored” → researched humanizing / de-generic copy tools → wrote Skills → published to ClawHub.
Five-Basket Radar: AI/Tech (≤40%), Product/Startup, Solopreneur, Investment/Macro, Social/International. 40% AI cap self-imposed during reflection.
Butler (Life Assistant)
7 cron tasks: Greeting (08:00) → Schedule (08:30) → 5 water reminders (rotating styles) → Health (20:00) → Summary (22:00).
Philosophy: <50 chars per reminder, ≥1.5h interval, 23:00–07:00 emergency only, no pestering if no reply.
ACP Coding Experts
Pi / Claude Code / Codex / OpenCode / Gemini / GPT-4.1-Codex. Max 6 concurrent, 120min TTL — queue or shed load when saturated so you don’t stampede gateways. Analysis agents don’t code — delegated via sessions_spawn.
Design Lesson
Don’t let analysis agents code directly. Early setup: coding + architect + PM roles. Result: almost no output, high overlap with Zoe + ACP, increased complexity. Cut them all. Zoe handles PM + architect.
Complexity grows fast: pairwise coordination explodes (six specialists ≈ fifteen pairwise handoffs if everyone talks to everyone). Each new agent ≈ half a day debugging conflicts, resource competition, and rule compatibility.
Three Core Engineering Problems
Problem 1: Context Is the Agent’s OS
The Problem: Entropy Always Increases
Without constraints, agent systems deterministically collapse. Agents are processes without an OS: no memory management, no garbage collection, no OOM protection.
Three incidents:
P0–8 Hours Paralysis
AINews session: 235K tokens. Gateway compaction → timeout → crash → macOS launchd ThrottleInterval=1 infinite loop. All agents offline.
Fix: Clean session → ThrottleInterval 1→10 → idleMinutes 180→30 → execution policy tightened from permissive to allowlist (smaller blast radius; keep the list maintained). Four defense lines missing.
P1–3500 Chars → 800 Chars
Trading’s flash report had data tables. OpenClaw auto-compacted when exceeded textChunkLimit—"intelligently compressed" away. AI "help" is disaster in data-dense scenarios.
P2 — Rules Ignore After Bloat
Sessions bloat to 10K+ tokens → agents “selectively comply.” Butler doing investment analysis. Trading ignoring validation. Critical info drowned in noise.
Solution: Dual-Layer Control
Layer 1: Context Engineering (information architecture)
SOUL.md (front): Identity + hard constraints + decision framework (40–60 lines)
AGENTS.md (after): Operating norms + collaboration protocols
Skills: Via extraDirs on-demand (Trading: 15 Skills, 68K lines on disk—retrieve or inject only the 1–3 relevant fragments per turn, not the whole tree)
shared-context/: Cross-agent state, read via tools
Obsidian: Cold storage, archives output, no inference
Rule wording targets weakest model (GPT-4.1 → Qwen3.5 → Ollama qwen3:8b):
"Suggest not fabricating" → qwen3:8b ignores
"MUST: do not fabricate" → all comply
"MUST + P0 + NON-NEGOTIABLE" → even weak models comply
Write for weakest link.
Layer 2: Harness (framework lifecycle management)
Without Harness → 235K tokens → crash. Without Context Engineering → all piled → rules drowned.
Representative openclaw.json excerpt (field names drift by release—validate against your OpenClaw version before paste-deploying):
{
"compaction": {
"mode": "safeguard",
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 40000,
"prompt": "Distill to memory/YYYY-MM-DD.md. Focus: decisions, state changes, lessons."
}
},
"contextPruning": { "mode": "cache-ttl", "ttl": "6h", "keepLastAssistants": 3 },
"session": {
"reset": { "mode": "daily", "atHour": 5, "idleMinutes": 30 },
"maintenance": { "pruneAfter": "7d", "maxDiskBytes": 104857600 }
},
"hooks": { "bootstrap": ["self-improving-agent"] }
}
Cross-session recovery:
New session → SOUL.md + AGENTS.md + MEMORY.md + .learnings/ → memorySearch → shared-context/
= "Knows who, what done, what team doing"
Problem 2: Let Agents Remember and Grow
The Problem: Repeating Mistakes
Trading got BILLBOARD_BUY_AMT wrong 5 times (wrote BUY_AMT). Session reset → lost memory → repeat. User corrects → agent changes → 3 days later same scenario → same error.
Chatbot vs Agent dividing line: Agents learn from mistakes.
Solution: Five-Layer Memory
Autonomous Memory: 6-Step Cycle
Trigger: Operation failed · User corrected · Better approach found
L4 Recording: Write to .learnings/ERRORS.md or LEARNINGS.md
Daily Reflection (22:00): Review .learnings/, Zoe aggregates cross-agent value
PROMOTE: 3+ verifications → MEMORY.md, single → keep observing
L2 Sedimentation: Weekly compression, ❤000 tokens
L5 Skill: Generalizable → write as Skills → ClawHub
This is the core mechanism. Without it: chatbot. With it: agent.
Chatbot vs Agent
Problem 3: Let Agents Collaborate
The Problem: Multi-Agent Communication
Initial issues:
Status sync failures: A finished, B didn’t know
Resource contention: Multiple agents write same file
Information silos: Macro produced, Trading never saw
Responsibility gaps: “Who’s handling this?” → all silent
Solution: Three-State Protocol + Event Bus
Protocol (three active states + default silent state):
request → confirmed → final → [silent]
request: Explicitly acknowledges, starts the loop
confirmed: In progress, sends intermediate updates
final: Complete, result delivered, loop closes
[silent]: Default state when no active task — “no news is good news” (prevents spam)
Event Bus:
{
"type": "MARKET_CLOSE",
"source": "TRADING",
"timestamp": "2026-03-07T15:00:00-08:00",
"payload": { "symbol": "SPY", "note": "schema omitted for brevity" },
"requiresAck": false
}
Shared Context:
tech-radar.json — Read-only except authorized writers
market-status.json — Trading updates, Macro/Content consume
Guardrails:
No ad-hoc cross-agent file writes; mediated writes only (tools, bus, approved writers)
All communication via event bus or shared-context; never park API keys or session tokens in shared JSON — use your platform’s secret store
Zoe has final arbitration
Results: 4 Weeks, 23 Auto-Recoveries
Timeline:
Week 1: Basic setup, single agent, frequent crashes
Week 2: Multi-agent coordination, protocols established
Week 3: Autonomous evolution, agents self-fixing
Week 4: Production-ready, 60 cron tasks smooth
Key Metrics:
What I Learned
1. Agents Need an OS, Not Just Prompts
Context Engineering is OS design. You need:
Memory management (compaction, pruning, reset)
Process isolation (separate workspaces)
IPC mechanisms (event bus, shared context)
Garbage collection (session cleanup, disk limits)
2. Memory = Chatbot vs Agent
Can’t remember yesterday’s mistakes = fancy chatbot. Five-layer memory transforms stateless LLM calls into stateful, learning entities.
3. Constraints Enable Creativity
Clear boundaries = more creative, not less. 40% AI quota, three-state protocol, hard “MUST” rules — these are guardrails for autonomous operation.
4. Multi-Model Fallback Is Production Necessity
GPT-4.1 → Qwen3.5 → Ollama qwen3:8b. Write rules for weakest link.
5. Human: Doer → Designer
My job: design system where code writes itself. I’m architect, not bricklayer.
Looking Ahead
Next steps:
P0: Dead-letter queue for failed events
P1: Manual resend CLI for stuck tasks
P1: Audit log rotation
P2: Visual dashboard for system health
Goal: Amplify human capability. One person + six agents > one person + zero agents. That’s Harness Engineering.
Quick Reference
Agent Roster
Total: 60 cron, ~90 Skills
Daily Schedule (Pacific Time, America/Los_Angeles)
Cron rows are snapshots from my stack — align to your exchange calendar, asset class (equities vs. crypto), and whether you are on PST or PDT.
Critical Files
If you run a similar harness, how do you handle failures when compaction, cron, and multi-agent handoffs all interact — what breaks first in your stack, and what fixed it?
References
OpenClaw: https://github.com/openclaw/openclaw
PowerMem: https://github.com/oceanbase/powermem
ClawHub: https://github.com/openclaw/clawhub
Mitchell Hashimoto: “My AI Adoption Journey” — https://mitchellh.com/writing/my-ai-adoption-journey
Anthropic: “Effective harnesses for long-running agents” — https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
All times Pacific Time (America/Los_Angeles; PST or PDT depending on season). macOS + OpenClaw. Monitoring: Feb–Mar 2026. Validate config against your OpenClaw release at https://github.com/openclaw/openclaw









Top comments (0)