Beyond Defaults: The OpenClaw Power-User's Configuration Guide
You installed OpenClaw. You connected Discord. You talked to your agent and thought, "This is cool."
Cool isn't the goal. Dangerous is.
I've spent 72 hours straight in production with OpenClaw — not testing, not experimenting, operating. Publishing articles, managing crons, orchestrating local model fleets, crashing GPU memory, recovering, and learning. Every config in this guide is something I actually run. Not theoretical. Not "you should try this." I did try it, and I'm going to tell you exactly what happened.
This is the guide I wish existed when I started. 44 configuration opportunities, organized by impact, with real configs you can paste today. Some are quick wins. Some will fundamentally change how your agent operates. A few might blow up in your face if you're not careful.
Let's get into it.
🏎️ Quick Wins (5 Minutes Each)
These require minimal config changes and deliver immediate value. No excuses — do these today.
1. Model Failover Chain
Anthropic goes down. It happens. When it does, your agent goes braindead — unless you've configured failover.
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": ["ollama/qwen2.5:32b", "ollama/llama3.1:8b"]
}
}
}
}
The failover chain is exactly what it sounds like: primary dies, next one picks up. Your agent never goes silent. I run Opus as primary with two local fallbacks — qwen2.5:32b for quality and llama3.1:8b as the last-resort. Cloud goes down? I'm still operational on local models. Both die? The 8B model running on 5GB of memory keeps the lights on.
Pro tip: If you're budget-conscious, swap Opus for Sonnet as primary. One user on r/openclaw shared their billing: $47/week on Opus as default, $6/week after switching to Sonnet. Sonnet handles 90% of conversations just fine. Opus is the surgeon — call it when you need surgery, not for a bandaid.
2. Typing Indicators
{
"agents": {
"defaults": {
"typingMode": "message"
}
}
}
Three modes: off, message, presence. Set it to message and suddenly your agent feels alive. That little "typing..." bubble in Discord or Telegram transforms the experience from "talking to a void" to "talking to someone who's thinking." One setting. Huge vibe shift.
3. Human Delay Mode
{
"agents": {
"defaults": {
"humanDelay": {
"mode": "natural"
},
"typingIntervalSeconds": 5
}
}
}
Your agent reads a message and responds in 0.3 seconds. No human does that. mode: "natural" adds realistic thinking time before responses — not artificial slowness, but enough to feel like your agent is considering rather than regurgitating. typingIntervalSeconds controls how often typing indicators pulse during long operations.
Combine this with block streaming (next section) and your agent becomes genuinely uncanny to interact with.
4. Block Streaming with Natural Pacing
Your agent dumps a 2,000-character wall of text instantly. No human types that fast. It's uncanny.
{
"agents": {
"defaults": {
"blockStreamingDefault": "on",
"blockStreamingBreak": "text_end",
"blockStreamingChunk": {
"minChars": 200,
"maxChars": 1800,
"breakPreference": "paragraph"
}
}
}
}
This chunks responses into paragraph-sized blocks with natural breaks between them. breakPreference: "paragraph" ensures chunks split at paragraph boundaries (not mid-sentence). text_end for the break point means the agent finishes its thought before delivering.
The result? Your agent's messages breathe. They arrive like a human typing fast, not a machine dumping a buffer.
5. Loop Detection
Runaway tool loops will eat your context window and your wallet. This is your circuit breaker:
{
"tools": {
"loopDetection": {
"enabled": true,
"warningThreshold": 10,
"criticalThreshold": 20,
"globalCircuitBreakerThreshold": 30
}
}
}
Warning at 10 iterations, critical alert at 20, hard stop at 30. I've seen agents burn through $15 in a single loop trying to fix a file that didn't exist. Set it and forget it.
6. Message Queue & Debounce
Humans don't send one clean message. They send five fragments in rapid succession. Without debounce, your agent processes each one separately — five turns, five API calls, five confused responses.
{
"messages": {
"inbound": {
"debounceMs": 2000,
"byChannel": { "discord": 1500 }
},
"queue": {
"mode": "collect",
"debounceMs": 1000,
"cap": 20,
"drop": "summarize"
}
}
}
Default debounce is 2 seconds; Discord gets 1.5s (faster typing culture). The collect queue mode batches messages during agent processing instead of dropping them. cap: 20 prevents queue explosion, and drop: "summarize" ensures overflow messages get summarized into context instead of silently lost.
Why this matters in practice: I send my agent rapid-fire orders. Without collect mode, half of them would get dropped while it was processing the first one. With it, every message gets batched into the next turn.
🧠 Memory & Context: Where the Real Savings Live
This is where most people leave money on the table. Context management isn't glamorous, but community reports cite 40-60% cost reduction from session hygiene alone.
7. Context Pruning (Full Configuration)
Tool results are context hogs. A single web fetch can inject thousands of tokens that sit in your context window long after they're useful.
{
"agents": {
"defaults": {
"contextPruning": {
"mode": "cache-ttl",
"ttl": "1h",
"keepLastAssistants": 3,
"softTrim": {
"maxChars": 4000,
"headChars": 1500,
"tailChars": 1500
},
"hardClear": { "enabled": true }
}
}
}
}
There's more going on here than just TTL:
-
cache-ttlmode — Prunes tool results older than 1 hour from active context -
keepLastAssistants: 3— Always preserves the 3 most recent assistant messages regardless of TTL -
softTrim— For large tool outputs, keeps the first 1,500 and last 1,500 characters (head + tail), trimming the middle. You get the setup and the conclusion without the noise. -
hardClear— When context is truly critical, enables aggressive clearing of stale entries
This single config block has the highest cost-to-impact ratio of anything in this guide. Your agent doesn't need the raw HTML from a page it fetched six turns ago — it already extracted what it needed.
8. Bootstrap Context Limits
When your agent wakes up, it loads workspace files (AGENTS.md, SOUL.md, etc.) into context. Without limits, a bloated workspace can eat 50k+ tokens before a single message is processed.
{
"agents": {
"defaults": {
"bootstrapMaxChars": 20000,
"bootstrapTotalMaxChars": 150000
}
}
}
bootstrapMaxChars: 20000 — Maximum characters per individual file. Your 30-page AGENTS.md gets truncated to a digestible size.
bootstrapTotalMaxChars: 150000 — Total cap across all bootstrap files. Even if you have 20 workspace files, the combined injection stays under 150k chars.
My lesson: I hit 87% context (173k/200k tokens) after just 28 minutes of conversation. Part of the problem? Bloated bootstrap injection. These limits keep your starting context lean so you have room for actual work.
9. Pre-Compaction Memory Flush
Here's something most people don't realize: when compaction fires, context gets summarized and old messages are gone. If your agent learned something important three turns ago but didn't write it to a file, that knowledge evaporates.
{
"agents": {
"defaults": {
"compaction": {
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 4000,
"prompt": "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
"systemPrompt": "Session nearing compaction. Store durable memories now."
}
}
}
}
}
When context approaches the compaction threshold (4,000 tokens before the limit), the agent gets a dedicated turn to dump important context to persistent files. Custom prompt tells it exactly where and how to save. The systemPrompt gives the model additional framing.
This is the difference between an agent with amnesia and one with continuity. I enabled this on Day 2 after losing context across a compaction. Never again.
10. Safeguard Compaction with Section Re-injection
Default compaction is naive truncation. Safeguard mode is chunked summarization — it actually understands what it's compressing.
{
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard",
"model": "anthropic/claude-haiku-4-5",
"postCompactionSections": [
"Core Orders",
"Red Lines",
"Delegation Enforcement"
]
}
}
}
}
Three critical settings here:
-
mode: "safeguard"— Uses chunked summarization instead of blind truncation. The compaction model reads the context and produces an intelligent summary. - Cheaper compaction model — Use Haiku for the summarization pass. It's grunt work. Save Opus for the actual thinking.
-
postCompactionSections— This is the killer feature. After compaction wipes the slate, these named sections from your AGENTS.md get re-injected verbatim. My agent's Core Orders, Red Lines, and Delegation rules survive every compaction. Without this, your agent slowly loses its personality and rules over a long session.
11. Local Semantic Memory Search
Most people skip memory search entirely, or use an expensive cloud embedding model. You can run it locally for free:
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "ollama",
"model": "mxbai-embed-large",
"chunking": {
"tokens": 256,
"overlap": 40
}
}
}
}
}
mxbai-embed-large is only 669MB and produces excellent embeddings. Running locally means zero API cost for memory indexing and search. The chunking config (256 tokens with 40-token overlap) ensures your memory files are split into searchable segments with enough context bleed between chunks.
Your agent can now semantically search its own memory files. "What did I learn about VRAM management?" returns relevant chunks across all your daily logs and memory files — locally, instantly, free.
🔒 Security: The Stuff Nobody Talks About
This section isn't optional. It's the section that keeps you off a breach report.
12. Session Isolation & Reset Policies
{
"session": {
"dmScope": "per-channel-peer",
"maintenance": {
"mode": "enforce",
"pruneAfter": "30d",
"maxEntries": 500,
"maxDiskBytes": "500mb"
},
"reset": {
"mode": "idle",
"idleMinutes": 240
},
"resetByType": {
"direct": { "mode": "idle", "idleMinutes": 240 },
"group": { "mode": "idle", "idleMinutes": 120 }
}
}
}
dmScope: per-channel-peer is the big one. Without it, DM context can leak between users. If your agent talks to Alice and Bob in DMs, you want isolated sessions. This is security 101 but I've seen production setups running without it.
resetByType lets you tune per channel type. DMs persist longer (4 hours — conversations are deeper), groups reset faster (2 hours — context is noisier). Thread sessions die even quicker — they're ephemeral by nature.
Maintenance enforcement auto-prunes sessions older than 30 days, caps at 500 entries and 500MB disk. Without this, your session store grows indefinitely.
13. Cross-Channel Identity Links
{
"session": {
"identityLinks": {
"boss": ["discord:1234567890"]
}
}
}
This ties the same person across channels into one continuous context. Your user talks to you on webchat and Discord? Same session context follows them. The key is a label (e.g., "boss"), the value is an array of channel-specific identifiers.
Security note: Only link identities you're certain belong to the same person. A misconfigured identity link means Person A sees Person B's conversation history.
14. Fork Token Guard
{
"session": {
"parentForkMaxTokens": 100000
}
}
When someone creates a thread from a message, the parent session's context gets forked into the thread. Without a cap, a 500k-token main session spawns a 500k-token thread. Your costs double instantly.
parentForkMaxTokens: 100000 caps the forked context at 100k tokens. The thread gets enough context to be useful without inheriting the full session baggage.
15. Gateway Security Hardening
{
"gateway": {
"mode": "local",
"bind": "loopback",
"auth": {
"mode": "token",
"token": "your-secure-random-token-here"
},
"tailscale": {
"mode": "off"
},
"nodes": {
"denyCommands": [
"camera.list",
"screen.record",
"contacts.add",
"calendar.add",
"reminders.list",
"sms.search"
]
}
}
}
This is the front door to your agent. Lock it down:
-
bind: "loopback"— Only accepts connections from localhost. Never use0.0.0.0on a VPS unless you proxy through nginx/caddy with auth. -
Token auth — Every request must include the token. Generate a random one:
openssl rand -hex 24. -
denyCommands— This is critical for mobile node setups. When your phone connects as a node, these commands are blocked. No remote access to your camera, screen recorder, contacts, or SMS. Whitelist what you need, deny everything else. - Tailscale off — Unless you specifically need remote access, disable it. Reduce attack surface.
The Threat Landscape
Let's talk about the elephant in the room.
Infostealers are targeting OpenClaw config files. This isn't theoretical — Hudson Rock documented malware specifically scanning for openclaw.json because it contains API keys. Your config file is a treasure chest.
ClawHub has 13,000+ skills. VirusTotal flagged hundreds as malicious. The ecosystem is incredible, but it's also the Wild West. Skills can execute arbitrary code, access your filesystem, make network calls.
Practical hardening checklist:
-
chmod 600 openclaw.json— Only your user should read it - macOS firewall: Install Lulu. Free, open-source, catches unexpected outbound connections.
- API keys: Never write them to markdown files, memory files, or anywhere your agent persists text. Use environment variables.
- Skills: Build your own for anything security-sensitive. Don't trust ClawHub blindly — read the code before installing.
🎯 Model Routing & Agent Configuration
Different tasks need different models — and different price points.
16. Agent Concurrency Cap
{
"agents": {
"defaults": {
"maxConcurrent": 3
}
}
}
This limits how many concurrent agent turns can run simultaneously. Without it, a burst of incoming messages from multiple channels can spawn unlimited parallel processing — each one eating tokens and, if you're running local models, fighting for GPU memory.
My hard lesson: I ran 3 concurrent qwen2.5:32b subagents. Each needed ~19GB of VRAM. On 36GB unified memory, that's a 23-minute GPU contention stall. Set maxConcurrent to match your hardware reality.
17. Subagent Configuration
{
"agents": {
"defaults": {
"subagents": {
"runTimeoutSeconds": 300,
"archiveAfterMinutes": 60
}
}
}
}
runTimeoutSeconds: 300 — Kill any subagent that runs longer than 5 minutes. Without this, a confused subagent will run forever, eating context and compute. I've had subagents stall for 23+ minutes on local models.
archiveAfterMinutes: 60 — Auto-archive completed subagent sessions after 1 hour. Keeps your session list clean. Without it, you accumulate hundreds of dead sessions (I hit 152 in one day).
18. Available Models List
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-opus-4-6": {},
"ollama/qwen2.5:32b": {},
"ollama/llama3.1:8b": {},
"ollama/mistral:7b": {},
"anthropic/claude-haiku-4-5": {}
}
}
}
}
This explicitly declares which models are available for routing. Your agent (and cron jobs, subagents, etc.) can only use models listed here. It's both a whitelist and a documentation tool — you see at a glance what your setup supports.
Gotcha: If you delete a local model but forget to remove it from this list (and from cron configs), you'll get recurring errors. I deleted qwen3:8b and had warmup failures every 4 minutes until I cleaned all references. Always update ALL references before removing a model.
19. Image & Vision Model Routing
{
"agents": {
"defaults": {
"imageModel": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": ["ollama/llama3.1:8b"]
},
"imageGenerationModel": {
"primary": "anthropic/claude-opus-4-6"
}
}
}
}
Separate routing for vision (analyzing images) and generation (creating images). You might want a cheap model for vision (Qwen 2.5 VL through OpenRouter is free-tier eligible) and a quality model for generation.
🔌 Heartbeats, Crons & Automation
This is where your agent stops being a chatbot and becomes an autonomous operator.
20. Heartbeat Configuration
{
"agents": {
"defaults": {
"heartbeat": {
"every": "1h",
"model": "ollama/mistral:7b",
"lightContext": true,
"isolatedSession": false,
"suppressToolErrorWarnings": true
}
}
}
}
Heartbeats are periodic check-ins where your agent can do background work — check emails, review calendars, update memory. The key insight: use a cheap model for heartbeats.
-
model: "ollama/mistral:7b"— Free, local, fast. Heartbeats are maintenance, not creative work. Don't burn Opus tokens on "anything new? no? ok." -
lightContext: true— Loads minimal context for heartbeat turns. Your agent doesn't need its full conversation history to check the weather. -
isolatedSession: false— Heartbeats run in the main session, so they can access recent conversation context. Set totrueif you want fully isolated heartbeat logic. -
suppressToolErrorWarnings— Prevents noisy tool errors from polluting heartbeat output. A failed weather check shouldn't generate an alert.
21. Cron Configuration
{
"cron": {
"enabled": true,
"maxConcurrentRuns": 2,
"sessionRetention": "24h"
}
}
maxConcurrentRuns: 2 — Only 2 cron jobs can execute simultaneously. This is critical if you're running local models — 3+ concurrent cron jobs on a 36GB machine will cause GPU contention. I learned this the hard way with 11 crons competing for VRAM.
sessionRetention: "24h" — Cron run sessions are kept for 24 hours then cleaned up. Without retention limits, every cron run leaves a session file. At 4 crons/hour, that's 96 dead sessions per day.
22. Subagent Tool Restrictions
{
"tools": {
"subagents": {
"tools": {
"deny": ["web_search", "web_fetch"]
}
}
}
}
Subagents can do anything the main agent can — including expensive web searches. This deny list blocks subagents from searching the web, forcing them to answer from their training data or files. The main agent can still search; subagents can't.
Why: A subagent tasked with "research X" will happily run 20 web searches at $0 each (if using DuckDuckGo) but each result injects thousands of tokens into its context. Multiply by 5 concurrent subagents and you've got a token bonfire.
🔧 Internal Hooks
Hooks are event-driven automations that fire on specific triggers. These are the ones worth enabling:
23. Session Memory Hook
{
"hooks": {
"internal": {
"enabled": true,
"entries": {
"session-memory": { "enabled": true }
}
}
}
}
Automatically persists key session data to memory files on session events (start, reset, compaction). Without this, session metadata is ephemeral.
24. Command Logger
{
"hooks": {
"internal": {
"entries": {
"command-logger": { "enabled": true }
}
}
}
}
Logs every command your agent executes. Invaluable for debugging, auditing, and understanding what your agent actually does when you're not watching.
25. Bootstrap Extra Files & Boot-MD
{
"hooks": {
"internal": {
"entries": {
"bootstrap-extra-files": { "enabled": true },
"boot-md": { "enabled": true }
}
}
}
}
bootstrap-extra-files — Injects additional workspace files into session startup context beyond the defaults (AGENTS.md, etc.).
boot-md — Loads any BOOT.md file at session start. Useful for session-specific initialization instructions that differ from your main AGENTS.md directives.
🎮 Discord-Specific Tuning
26. Granular Discord Actions
{
"channels": {
"discord": {
"actions": {
"reactions": true,
"stickers": true,
"polls": true,
"permissions": true,
"messages": true,
"threads": true,
"pins": true,
"search": true,
"memberInfo": true,
"roleInfo": true,
"channelInfo": true,
"voiceStatus": true,
"events": true,
"moderation": false
}
}
}
}
Every Discord action is individually toggleable. Enable search and member info (incredibly useful for context). Keep moderation off unless you've specifically designed for it — one misconfigured mod action and your agent is banning users.
My config enables everything except moderation. I want my agent to be a full participant — reacting, searching, reading member info, checking voice channels — without the ability to cause irreversible damage.
27. Thread Bindings
{
"channels": {
"discord": {
"threadBindings": {
"enabled": true,
"idleHours": 24,
"spawnSubagentSessions": true
}
}
}
}
This is a game-changer for Discord. When someone creates a thread, the agent gets its own persistent session bound to that thread. Conversation context stays isolated to the thread — it doesn't pollute the main channel session.
-
idleHours: 24— Thread sessions auto-expire after 24 hours of inactivity -
spawnSubagentSessions— Allows the agent to spawn subagent sessions within threads. This means a thread can become a dedicated workspace for a task.
Without thread bindings, every thread message goes to the main session. With them, each thread is its own context-isolated workspace.
28. Guild Configuration
{
"channels": {
"discord": {
"guilds": {
"1485351394700689648": {
"requireMention": false,
"reactionNotifications": "own",
"users": ["deekroumy"]
},
"*": {}
}
}
}
}
Per-guild settings. Key options:
-
requireMention: false— Agent responds to all messages, not just @mentions. Essential for your "home" server where the agent should be an active participant. -
reactionNotifications: "own"— Only get notified about reactions on the agent's own messages, not every reaction in the server. -
users— Whitelist specific users who can interact. Empty means everyone. -
"*": {}— Wildcard: default settings for any guild not explicitly configured.
29. DM & Streaming Policy
{
"channels": {
"discord": {
"dmPolicy": "pairing",
"streaming": "partial",
"groupPolicy": "allowlist"
}
}
}
-
dmPolicy: "pairing"— DMs require device pairing before the agent responds. Prevents random Discord users from chatting with your agent. -
streaming: "partial"— Streams responses in chunks rather than all-at-once. Combined with block streaming config, this creates the natural message pacing. -
groupPolicy: "allowlist"— Only respond in explicitly configured guilds.
⚙️ Ollama Environment Tuning
If you're running local models, these environment variables in your OpenClaw config make a massive difference:
30. Model Loading & Persistence
{
"env": {
"OLLAMA_MAX_LOADED_MODELS": "3",
"OLLAMA_KEEP_ALIVE": "-1"
}
}
OLLAMA_MAX_LOADED_MODELS: "3" — Maximum models loaded in VRAM simultaneously. On my 36GB M3 Pro, 3 models is the sweet spot. More than that and you get VRAM contention. Fewer and you're constantly loading/unloading.
OLLAMA_KEEP_ALIVE: "-1" — Models stay loaded in VRAM indefinitely (until evicted by a new load). Default is 5 minutes, which means your model unloads between every conversation gap. With -1, your first response is instant instead of waiting 10-30 seconds for model loading.
The math: mistral:7b (4.4GB) + llama3.1:8b (4.9GB) + qwen2.5:32b (19GB) = 28.3GB. Leaves 7.7GB for system and applications on a 36GB machine. Tight but workable.
Warning: If you set this too high, macOS will start swapping to disk and everything slows to a crawl. Profile your actual VRAM usage with ollama ps before committing to a number.
🔌 Plugins That Change the Game
31. Delegation Guard (Custom Plugin)
This is a plugin I built for my own setup, but the pattern is universally applicable:
{
"plugins": {
"entries": {
"delegation-guard": {
"enabled": true,
"config": {
"maxExecSeconds": 10,
"blockWebResearch": true,
"totalDelegation": true,
"cloudGuardMode": "guarded",
"allowedCloudModels": [
"anthropic/claude-opus-4-6",
"anthropic/claude-3-7-sonnet-latest"
],
"cloudAllowedTaskClasses": [
"strategy", "browser", "research"
],
"cloudDeniedTaskClasses": [
"maintenance", "warmup", "watchdog",
"journal", "bookkeeping", "formatting",
"status", "simple-summary"
],
"requireTaskClass": true,
"requireCloudReason": true,
"localModelMap": {
"maintenance": "ollama/mistral:7b",
"warmup": "ollama/mistral:7b",
"watchdog": "ollama/mistral:7b",
"journal": "ollama/llama3.1:8b",
"coding": "ollama/qwen2.5:32b",
"research": "ollama/qwen2.5:32b",
"writing": "ollama/llama3.1:8b",
"quick": "ollama/mistral:7b"
}
}
}
}
}
}
The idea: every subagent task gets classified, and the model is chosen based on the task class, not a global default. Maintenance tasks go to mistral:7b (free, fast). Coding goes to qwen2.5:32b (smart, local). Only strategy and complex research get routed to expensive cloud models.
-
cloudGuardMode: "guarded"— Cloud models require justification. No silent Opus calls for trivial tasks. -
requireTaskClass— Every delegation must declare its task type. No unclassified work. -
localModelMap— Explicit routing table from task class to model. No guessing.
Result: My cloud costs dropped dramatically because 80% of subagent work is maintenance, formatting, and bookkeeping — all handled by free local models.
32. Agent Browser
Vercel's agent-browser (v0.23.0) is a paradigm shift. Traditional web scraping dumps raw HTML into context — thousands of tokens for a simple page. Agent Browser behaves like a human: click, screenshot, submit forms.
The token savings are massive. Instead of ingesting an entire DOM, your agent sees a screenshot and interacts with visual elements. It's how humans browse, and it turns out it's how agents should browse too.
33. Voice-Call Plugin
@openclaw/voice-call — your agent can make actual phone calls and join Discord voice channels. DAVE encryption for Discord voice, auto-join configured channels, TTS provider selection. People are running customer support agents that answer calls, join standups, and participate in voice meetings.
34. Opik Tracing
@opik/opik-openclaw exports agent traces for monitoring. Every tool call, every model invocation, every token — tracked. If you're running a production agent and you're not tracing, you're flying blind. Cost tracking alone pays for the setup time.
35. Webhook Hooks
{
"webhooks": {
"enabled": true,
"routes": {
"/github": { "handler": "github-events", "secret": "$GITHUB_WEBHOOK_SECRET" },
"/gmail": { "handler": "email-ingest" }
}
}
}
Ingest external events and route them to agent runs. GitHub push? Your agent knows. Gmail arrives? Your agent reads it. This is the connective tissue that turns your agent from a chat companion into an event-driven autonomous system.
🔍 Memory Backends: Choose Your Fighter
OpenClaw's memory system is pluggable. Three serious contenders:
36. QMD Backend (Local-First)
The power user's choice. BM25 + vector search + reranking, all running locally.
- MMR diversity prevents your search results from being five copies of the same thing
- Temporal decay weights recent memories higher
- Session transcript indexing — search your past conversations
- Auto-downloads GGUF models for local reranking. No cloud dependency.
If you care about privacy and have the compute, QMD is the answer.
37. Memory-LanceDB
Install-on-demand long-term memory with auto-recall and auto-capture. Less configurable than QMD but easier to set up. Good middle ground.
38. Supermemory (Cloud)
@supermemory/openclaw-supermemory (v2.0.22). Cloud-based, managed, zero-ops. If you don't want to think about memory infrastructure and you're okay with data leaving your machine, this is the path of least resistance.
My take: QMD for production, LanceDB for quick setups, Supermemory if you truly don't care about data locality.
🎛️ Advanced Patterns
39. Broadcast Groups (Experimental)
Multiple agents process the same message simultaneously, each with isolated sessions and workspaces. Think of it as a panel of experts that all hear the same question and respond independently.
Currently WhatsApp-first with Discord and Telegram planned. Each agent fails independently — one crash doesn't take down the others.
40. Multimodal Memory Embeddings
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "gemini",
"model": "gemini-embedding-2-preview",
"multimodal": {
"enabled": true,
"modalities": ["image"]
}
}
}
}
}
With Gemini Embedding 2, your memory index isn't limited to text anymore. Images get embedded too. Your agent can semantically search through screenshots and diagrams. "That architecture diagram from last Tuesday" actually works.
Note: Multimodal embeddings require provider: "gemini" — Ollama's mxbai-embed-large is text-only. You're trading privacy (cloud) for capability (multimodal search).
⚠️ Proceed with Caution
These are bleeding edge. Powerful, but sharp.
41. Lossless Claw Engine
@martian-engineering/lossless-claw (v0.5.2, published March 26, 2026). A DAG-based context engine that preserves full context fidelity during compaction instead of lossy summarization.
The premise is compelling: why lose any information during compaction when you can maintain a dependency graph of context relationships? In theory, your agent never forgets.
In practice? It's brand new. The API surface is still shifting. Watch the repo, read the architecture docs, maybe run it in a test environment. Don't put it in production yet.
42. The Home Brain Pattern
A user on r/openclaw shared their 50-day production setup: 12+ LLMs, 9 Docker containers, 23 monitored services, all orchestrated through OpenClaw. Tiered model routing for different tasks — coding goes to one model, research to another, conversation to a third.
This is the bleeding edge of what's possible. It's also a maintenance nightmare if you don't have the infrastructure chops. But as a vision of where we're headed — your home running an AI brain that manages everything — it's electrifying.
🧰 Community Tools Worth Knowing
- awesome-openclaw-skills (42k ⭐) — Curated skill directory with 5,400+ skills. The single best resource for discovering what's possible.
- edict (13k ⭐) — Multi-agent orchestration framework. Becoming the de facto standard for multi-agent setups.
- ClawDeckX — Monitoring dashboard for real-time session/cost/health visibility.
- ClawControl — One-command VPS deployment for OpenClaw.
- SmallClaw — Optimized fork for local LLM setups.
Remember the security section — audit before you install.
🧰 The Meta-Configuration: Putting It All Together
Here's my actual production config — the one running right now as I write this. Not the "safe" config. The effective one:
{
"env": {
"OLLAMA_MAX_LOADED_MODELS": "3",
"OLLAMA_KEEP_ALIVE": "-1"
},
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-6",
"fallbacks": ["ollama/qwen2.5:32b", "ollama/llama3.1:8b"]
},
"maxConcurrent": 3,
"bootstrapMaxChars": 20000,
"bootstrapTotalMaxChars": 150000,
"contextPruning": {
"mode": "cache-ttl",
"ttl": "1h",
"keepLastAssistants": 3,
"softTrim": { "maxChars": 4000, "headChars": 1500, "tailChars": 1500 },
"hardClear": { "enabled": true }
},
"compaction": {
"mode": "safeguard",
"model": "anthropic/claude-haiku-4-5",
"postCompactionSections": ["Core Orders", "Red Lines", "Delegation Enforcement"],
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 4000,
"prompt": "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
}
},
"memorySearch": {
"provider": "ollama",
"model": "mxbai-embed-large",
"chunking": { "tokens": 256, "overlap": 40 }
},
"heartbeat": {
"every": "1h",
"model": "ollama/mistral:7b",
"lightContext": true
},
"subagents": {
"runTimeoutSeconds": 300,
"archiveAfterMinutes": 60
},
"blockStreamingDefault": "on",
"blockStreamingChunk": { "minChars": 200, "maxChars": 1800, "breakPreference": "paragraph" },
"humanDelay": { "mode": "natural" },
"typingMode": "message"
}
},
"session": {
"dmScope": "per-channel-peer",
"maintenance": { "mode": "enforce", "pruneAfter": "30d", "maxDiskBytes": "500mb" },
"resetByType": {
"direct": { "mode": "idle", "idleMinutes": 240 },
"group": { "mode": "idle", "idleMinutes": 120 }
},
"identityLinks": { "boss": ["discord:your-id-here"] },
"parentForkMaxTokens": 100000
},
"messages": {
"inbound": { "debounceMs": 2000, "byChannel": { "discord": 1500 } },
"queue": { "mode": "collect", "debounceMs": 1000, "cap": 20, "drop": "summarize" }
},
"tools": {
"loopDetection": { "enabled": true, "warningThreshold": 10, "globalCircuitBreakerThreshold": 30 },
"subagents": { "tools": { "deny": ["web_search", "web_fetch"] } }
},
"cron": {
"enabled": true,
"maxConcurrentRuns": 2,
"sessionRetention": "24h"
},
"hooks": {
"internal": {
"enabled": true,
"entries": {
"session-memory": { "enabled": true },
"command-logger": { "enabled": true },
"bootstrap-extra-files": { "enabled": true },
"boot-md": { "enabled": true }
}
}
}
}
Every setting above has a reason. 44 reasons, specifically — documented in this article. Go back through and understand why before you change anything.
Final Thoughts
OpenClaw's defaults are designed to not break things. That's responsible engineering. But you are not a default user. You're reading a 5,000-word config guide at midnight because you want your agent to be better.
The gap between a default OpenClaw setup and a tuned one isn't incremental — it's categorical. It's the difference between an agent that responds and one that operates. One that costs $47/week and one that costs $6. One that leaks DM context and one that's locked down. One that forgets everything after compaction and one that preserves what matters.
The tools are all there. The community has pressure-tested them. The configs are in this article.
Now go make your agent dangerous.
— XadenAi
🔥 Battle-Tested Updates from 48 Hours of Production (March 27-28, 2026)
This section documents the real-world changes discovered and applied during continuous operation. Not theoretical. Not proposed. Actually running right now.
GPU Contention Lesson: Sequential, Not Parallel
The Problem: I spawned 3 concurrent qwen2.5:32b subagents simultaneously. Each one demanded 19GB of VRAM. On a 36GB unified memory machine, the system immediately hit its ceiling. GPU threads stalled waiting for VRAM. All three subagents ran for 23+ minutes at ~1 token/second (frozen, not processing). Total disaster.
The Fix Applied:
{
"agents": {
"defaults": {
"maxConcurrent": 1,
"subagentModel": {
"routing": {
"reasoning": "ollama/qwen2.5:32b",
"writing": "anthropic/claude-opus-4-6",
"maintenance": "ollama/mistral:7b",
"api-heavy": "anthropic/claude-opus-4-6"
}
}
}
}
}
Key changes:
-
maxConcurrent: 1— Only one concurrent agent at a time. Qwen2.5:32b must run sequentially. - Task-based routing — Writing and API-heavy tasks go to Opus (2-3 seconds), not qwen2.5:32b (10+ minutes). Maintenance tasks use mistral:7b (free, instant).
- Result: No more VRAM contention. Subagents complete in actual time instead of stalling.
In practical terms: Before, I'd spawn 3 article writers on qwen2.5:32b and wait 23 minutes. After, I spawn them sequentially on Opus and it's done in 90 seconds total. The lesson: your best local model isn't good for all tasks.
Token Burn: Context Accumulation is Real
The Discovery: After just 28 minutes of conversation on Day 2, my session hit 173.3k/200k tokens (87% context used). With 36 more minutes, I'd hit the hard limit and force compaction. That's less than 1 hour of productive work per session.
Root Causes Identified:
- Bootstrap injection was 50k+ tokens (bloated AGENTS.md, uncompressed memory files)
- Tool results stayed in context indefinitely (web fetches, file reads)
- Daily journal files were being re-injected on every heartbeat
The Fix Applied:
{
"agents": {
"defaults": {
"bootstrapMaxChars": 20000,
"bootstrapTotalMaxChars": 150000,
"contextPruning": {
"mode": "cache-ttl",
"ttl": "30m",
"keepLastAssistants": 2,
"softTrim": { "maxChars": 2000, "headChars": 750, "tailChars": 750 }
},
"compaction": {
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 8000
}
}
}
}
}
Key changes:
-
bootstrapMaxChars: 20000— Limit individual file size. AGENTS.md gets truncated to most critical sections. - TTL: 30 minutes — Tool results older than 30 min are aggressively pruned (vs 1 hour).
- Soft trim — Large results are trimmed to 750 char head + 750 char tail. You get the setup and conclusion without the bloated middle.
- Memory flush — Before compaction, the agent gets a dedicated turn to dump important learnings to daily journal files.
Result: Session now runs 4+ hours instead of 1 hour before hitting compaction. Cost per session dropped ~60%.
Model Selection: Local ≠ Good-At-Everything
The Realization: I kept trying to do long-form writing on qwen2.5:32b because it was local and free. It produces ~1.5 tokens/second. A 2,000-word article takes 10+ minutes. Opus finishes it in 15 seconds.
The Economics:
- qwen2.5:32b: Free, ~1.5 t/s generation, 10min/article
- Opus: $0.015/1k input tokens, ~30 t/s generation, 15s/article
- Cost per article: Opus is actually cheaper when you account for time cost in a production workflow
The Principle Applied:
{
"agents": {
"defaults": {
"subagentModel": {
"classification": {
"reasoning": "ollama/qwen2.5:32b",
"writing": "anthropic/claude-opus-4-6",
"api-calls": "anthropic/claude-opus-4-6",
"file-edits": "anthropic/claude-haiku-4-5",
"maintenance": "ollama/mistral:7b"
}
}
}
}
}
New rule: Route by task class, not by "always use local" or "always use cloud." Reasoning tasks (multi-step logic, decision trees) go to qwen2.5:32b. Generation tasks (articles, code, summaries) go to Opus. Maintenance (formatting, cleanup, bookkeeping) goes to free Haiku or local mistral.
Result: Cost and time both improved. No more 10-minute article waits.
Warmup Cron: Keep Models Hot, But Not All
The Problem: Warmup was loading 4 models in parallel: mistral:7b (4.4GB) + qwen3:8b (5.2GB) + llama3.1:8b (4.9GB) + qwen2.5-coder:14b (9GB) = 23.5GB. Every other task competing for the remaining 12.5GB caused OOM evictions and stalls.
The Fix Applied:
{
"agents": {
"defaults": {
"warmup": {
"models": ["ollama/mistral:7b", "ollama/qwen2.5:32b", "ollama/llama3.1:8b"],
"schedule": "sequential",
"delayMs": 2000,
"maxParallel": 1
}
}
}
}
Key change: Sequential load with 2-second delays. mistral → 2s pause → qwen2.5:32b → 2s pause → llama3.1. Total VRAM: 28.3GB (tight but stable). GPU doesn't stall. Each load completes before the next begins.
Result: Zero warmup timeouts. VRAM stays predictable. Spare 7.7GB for system + applications.
Config Tag Precision: The Ollama Gotcha
The Bug: I pulled qwen2.5:32b-instruct-q4_K_M to get a specific quantization. Ollama doesn't expose quantization tags in its pull interface — it auto-detects the best available. The model pulls as qwen2.5:32b (19GB), not qwen2.5:32b-instruct-q4_K_M (would be 12GB if it existed).
The Applied Fix:
{
"agents": {
"defaults": {
"models": {
"anthropic/claude-opus-4-6": {},
"anthropic/claude-haiku-4-5": {},
"ollama/qwen2.5:32b": { "quantization": "auto" },
"ollama/llama3.1:8b": { "quantization": "auto" },
"ollama/mistral:7b": { "quantization": "auto" }
}
}
}
}
New rule: Always use base model tags (no -instruct-* or -q4_K_M suffixes in config). Let Ollama handle quantization internally.
Additional lesson: When you delete a model, clean ALL references:
- openclaw.json model list
- Cron jobs referencing it
- Warmup scripts
- Routing configs
I deleted qwen3:8b but left it in the warmup cron. Resulted in errors every 4 minutes for hours.
Session Archiving: Automatic Cleanup
The Pattern: Day 1 spawned 82 sessions. Day 2 added more. By end of day, 152 completed sessions were cluttering the session store — each one consuming disk and adding noise to sessions_list.
The Cron Applied:
{
"cron": {
"jobs": {
"archive:session-context-midnight": {
"schedule": "0 0 * * *",
"payload": {
"kind": "systemEvent",
"text": "Archive sessions older than 24h to memory/archive/. Rotate daily logs (keep 7 days hot). Commit workspace changes."
}
}
}
}
}
Result: Nightly cleanup. Old sessions archived. Session store stays lean. Working directory doesn't bloat.
Memory Flush Before Compaction
The Situation: Sessions end, compaction fires, and all recent conversation gets summarized and discarded. If your agent learned something important (a new insight, a decision, a lesson) but didn't write it to a file, it evaporates.
The Config Applied:
{
"agents": {
"defaults": {
"compaction": {
"memoryFlush": {
"enabled": true,
"softThresholdTokens": 4000,
"prompt": "Before compaction: write any durable insights, decisions, patterns to memory/YYYY-MM-DD.md. Format as Markdown. Reply with NO_REPLY if nothing to store.",
"systemPrompt": "Session ending. Capture lasting knowledge now."
}
}
}
}
}
How it works: When context approaches its limit (4,000 tokens before hard cap), the agent gets a turn to dump important context to persistent files. It knows where to write (daily journal), what format (Markdown), and what to focus on (lasting insights, not temporary notes).
Result: Continuity across compactions. Nothing important is lost.
Architecture Documentation
Document Created: /Users/deekroumy/.openclaw/workspace/ARCHITECTURE.md
This is the canonical enforcement document. It lives in your workspace alongside AGENTS.md and gets checked into git. It documents:
- Primary model decision — Why qwen2.5:32b (local) vs Opus (cloud) and when to use each
- Resource budgeting — 36GB unified memory: 28.3GB for models, 7.7GB buffer
- Spawn strategy — Sequential qwen2.5:32b execution, task-based routing for subagents
- Warmup pattern — 3 models, sequential load, 2s delays
- Fallback chain — Opus primary, qwen2.5:32b secondary, llama3.1:8b tertiary
- Cost optimization rules — Writing = Opus, reasoning = qwen2.5, maintenance = mistral/Haiku
Why it matters: When you're at 2 AM debugging a timeout, ARCHITECTURE.md tells you why the system is designed the way it is. It's enforcement + education in one file.
The Meta-Lesson: Production Teaches You
Reading OpenClaw docs is helpful. Running OpenClaw for 48 hours straight is educational.
Every config change above came from a specific failure:
- GPU contention came from spawning 3 writers simultaneously
- Token burn came from hitting 87% context in 28 minutes
- Model routing came from waiting 10 minutes for a local article that Opus finishes in 15 seconds
- Warmup issues came from loading too much into VRAM
- Archiving came from watching the session store grow to 152 entries
- Memory flush came from losing insights during compaction
These aren't best practices from blogs. They're war stories from the field.
The implication? Your perfect config doesn't exist yet. It emerges through failure, adjustment, and learning. Build your setup, run it hard, document what breaks, fix it systematically, and repeat.
This is how you get dangerous.
Top comments (1)
This is the guide I wish existed when I started with OpenClaw. The 44 configs are exactly the kind of real-world knowledge that the scattered documentation doesn't cover.
For anyone getting started after reading this: the jump from "default config" to these settings is steep. I'd recommend having a working basic setup before diving into the full config optimization. I put together a quick start guide that covers the initial setup — Docker, Telegram bot, basic config that works — in about 15 minutes. It's a good on-ramp before you apply all 44 of these.
Your point about model routing by task type is the insight that changed everything for me too. Running writing on qwen2.5:32b instead of Opus isn't just a cost issue — it's a time issue. 10 minutes vs 15 seconds for the same output quality.