zac

Posted on Apr 14 • Originally published at remoteopenclaw.com

The Complete Guide to Reducing OpenClaw Token Costs (Up...

#claude #ai #productivity #tutorial

Originally published on Remote OpenClaw.

OpenClaw is genuinely powerful. It's also genuinely expensive if you're not managing it carefully. People routinely exceed $200 in token costs in their first week of tinkering — not because they're doing anything extravagant, but because they don't yet understand where the tokens are actually going.

This guide covers everything: how OpenClaw's token consumption works, the specific settings that matter most, smart model routing, heartbeat management, local models, and an underrated technique involving n8n that can reduce recurring task costs to near zero. This guide is part of our Complete Guide to OpenClaw.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Join the Community

Join 1k+ OpenClaw operators sharing deployment guides, security configs, and workflow automations.

Join the Community →

How Does OpenClaw Actually Spend Your Tokens?

OpenClaw sends your full config files (agents.md, soul.md, memory.md) plus the entire session history with every single API call, making even simple messages far more expensive than most users realise. of a conversation. In reality, every single message you send triggers a much larger API call.

What gets sent with every request:

Core OpenClaw system instructions (you can't change these)
Your agents.md file (general agent behaviour instructions)
Your soul.md file (agent personality)
Your memory.md file (accumulated memories from all previous sessions)
The entire conversation history from the current session

So a simple message like "What should I have for lunch?" isn't just a short question. It's that question, plus several kilobytes of config files, plus every previous message in your current session — all sent to the API simultaneously.

Why session length matters so much: If you've been in the same Telegram thread for a week without resetting, every new message drags all seven days of conversation history into the API call. What started as a 10-cent message is now approaching 20-30 cents because of accumulated context.

Autonomous tasks compound this further. When you ask your agent to "restart OpenClaw to the latest version," that's not one API call. It's five: pull updated info, restart the server, run a health check, action any issues found, report back to you. Five cycles, each one carrying the full context load.

Once you understand this, every optimization strategy makes immediate sense.

Part 1: Context Window Management

Check Your Session Status Regularly

Use /status to see exactly how many tokens your current session is consuming. If you've never reset a session, you may be shocked by the number.

Three Commands You Should Know

/compact — Summarises your conversation history into a compressed version without wiping it entirely. Use this when you're mid-session on a complex task and don't want to lose context, but the session has grown too large. A session using 800,000 tokens can often be compacted down to 100,000.

/new (or /reset) — Wipes the current session entirely and starts fresh. The most token-efficient option when you're done with a topic. Before running this on a session where you've done significant work, ask your agent to write a temporary summary file first — then you can hand that file to the new session as a starting point.

/model — Switch models mid-conversation without starting a new session. Useful for doing expensive setup work with a high-capability model, then switching to a cheaper model for ongoing use within the same context.

Keep Your Config Files Lean

Your agents.md, soul.md, and memory.md files get sent with every single request. Bloat in these files directly increases your costs.

Take a snapshot of your file sizes today. Check again in a week. If any file has grown significantly, open it up and review for:

Repeated entries about the same topic (agents sometimes log the same thing twice across sessions)
Outdated information that no longer applies
Unnecessary detail that could be summarised more concisely

Paste bloated files into Claude or ChatGPT and ask for a condensed version that preserves all meaningful information. Memory configuration is one of the biggest levers for controlling context size. If your memory setup is causing unnecessary bloat or compaction, see our OpenClaw Memory Configuration Guide for the optimal settings. If memory search is failing and forcing your agent to re-explain context, our OpenClaw Memory Not Working: Fix Guide covers the seven most common failures.

Add These Instructions to Your `agents.md`

A few lines in your agent config that pay off in token savings every day:

"Respond in 1–2 paragraphs. Be concise. I'll ask for more detail if I need it." — Long AI responses feel thorough, but most of the time you don't need eight paragraphs. Shorter responses mean smaller output tokens and less context in the next turn.
"Don't narrate what you're about to do. Just do it." — "Let me check that for you!" followed by the actual check is two outputs where one would do. Cutting narration saves a cycle per action.
"Spin off sub-agents for large tasks rather than running them in the main session." — More on this below.

Part 2: Smart Model Routing

OpenClaw's biggest single lever for cost reduction is using the right model for each task through OpenRouter, which provides access to 600+ models with one API key. for cost reduction is using the right model for each task.

The mistake most people make: Using a top-tier model like Opus for absolutely everything, including tasks that don't need it.

A practical routing framework:

| Task Type | Recommended Model | |-----------|------------------| | Complex reasoning, architecture decisions, difficult code | Opus or equivalent | | Standard coding tasks | GPT-4.1 or Sonnet | | Research, summarisation, content drafting | Sonnet or Gemini Flash | | Heartbeats and background checks | Lightweight model or local | | Simple reminders and status updates | Cheapest available or local |

Kimi K2.5 (available via OpenRouter) has become a popular choice for primary model duty — near Opus-level capability at a fraction of the cost. Worth testing for your typical use patterns.

OpenRouter is the practical solution here. One API key, one endpoint, access to 600+ models across all major providers. You can assign different models to different agents and different task types without managing multiple accounts or API keys.

To set up model routing: open your openclaw.json and assign models per agent, or simply ask your agent to update its own config based on the routing rules you describe.

How Do You Optimise OpenClaw Heartbeat Costs?

OpenClaw heartbeats are one of the most overlooked sources of runaway costs, potentially burning $50/month just from the agent sitting idle with default 30-minute wake-ups on a premium model. sources of runaway costs.

By default, heartbeats wake your agent every 30 minutes to check if there's anything it should be doing. If your primary model is Opus, that's roughly 48 wake-ups per day, each one sending your full context load. The math gets ugly fast — potentially $50/month just from the agent sitting idle.

What to do:

Reduce heartbeat frequency. Change the heartbeat interval in your openclaw.json from every 30 minutes to every hour or longer. For most use cases, the agent doesn't need to check in that often.

Set active hours. Configure heartbeats to only run during the hours you're actually working, or at night if you specifically want background automation while you sleep.

Use a cheap model for heartbeats. Heartbeat tasks are simple — "check if anything needs doing." They don't need your best model. Assign a lightweight, inexpensive model specifically for heartbeat checks.

Use a local model for heartbeats (cost: $0). This is the most aggressive option.

Marketplace

Free skills and AI personas for OpenClaw — browse the marketplace.

Browse the Marketplace →

Key numbers to know

Part 4: Using Local Models with Ollama

OpenClaw supports local models through Ollama, eliminating API costs entirely for tasks like heartbeats and simple status checks assigned to them. for the tasks you assign to them.

Setup:

Go to ollama.com, download for your OS, and drag it to Applications
Open Ollama and download a model (it walks you through the selection)
Ask your OpenClaw agent: "I just installed Ollama and downloaded [model name]. Can you update my config to use this local model for heartbeats, keeping my current primary model for everything else?"

Your agent will handle the configuration. The result: free heartbeats, with your premium model reserved for tasks that actually need it.

Important caveat: Small local models (30B parameters and below) have weak context handling and poor agentic tool use. They'll drop tasks, miss instructions, and generally frustrate you if you try to use them for real work. Local models are excellent for simple, repetitive background tasks. They're not a substitute for a capable model when you need actual reasoning.

Part 5: Sub-Agent Architecture

OpenClaw's sub-agent architecture isolates large tasks in separate sessions so heavy work outputs never bloat your main session's ongoing token costs. you can make is stopping your main agent from doing large tasks directly.

When your main agent codes a full application, the entire codebase and all intermediate outputs end up in your main session's context. That context stays and grows, making every subsequent message more expensive.

The better pattern: spin off a sub-agent for large tasks. The sub-agent does all the heavy work in its own isolated session, then returns only the completed output to the main session. Your main context only sees the summary, not the entire working process.

Using Codex for development tasks: If you have a ChatGPT subscription, you can install Codex on your agent machine and authenticate with your existing subscription — no additional API costs for development work. Your agent spawns a Codex session for coding tasks, uses its included tokens, and returns the result.

Setting this up: authenticate Codex using auth on your agent machine, then instruct your main agent to route all coding tasks to a Codex sub-agent automatically.

Why Should You Use n8n Instead of OpenClaw for Recurring Tasks?

Offloading recurring OpenClaw tasks to n8n workflows is the most underrated cost-saving technique, reducing per-run costs from 10-50 cents to under a cent by bypassing the full context stack., and it's genuinely counterintuitive at first.

The insight: Not every automated task needs to run through OpenClaw's full context stack. Many recurring tasks — daily news summaries, weather reports, email monitoring, scheduled notifications — are simple enough that running them as lightweight n8n workflows is dramatically cheaper.

The difference in practice:

An OpenClaw cron job for a daily report:

Loads full context (agents.md, soul.md, memory.md, session history)
Uses your primary model
Costs 10–50 cents depending on context size and model

An n8n workflow for the same daily report:

Sends a lean API call with just a system message and today's prompt
Uses a cheap model (Minimax 2.5, for example)
Costs less than a cent

You can run three different daily reports — morning, midday, evening — for the same cost as one OpenClaw cron job.

n8n also solves the email monitoring problem elegantly. Instead of having OpenClaw check Gmail every hour (burning tokens each time, even when there's nothing new), write a simple n8n workflow that checks the mailbox and only wakes OpenClaw when there's actually something to handle. OpenClaw tokens spent: zero, unless there's a real task.

Setting up n8n with OpenClaw: n8n can send messages directly into your OpenClaw agent's session thread (in Telegram or Discord) — giving your agent context from external workflows without you lifting a finger.

If running a server or editing config files isn't your comfort zone, n8n's visual interface makes this approachable. It's used for automation by millions of non-technical users, and most of the connection patterns you'd want with OpenClaw are documented.

Part 7: Spending Limits and Token Accountability

Set API key credit limits. In your OpenRouter (or Anthropic API) account, you can set daily or monthly spending caps per API key. A $5/day hard limit prevents any single misconfiguration from becoming a $200 surprise.

Review weekly. Every week, spend ten minutes looking at where your tokens actually went. Which agents are most expensive? Which tasks are costing more than they're worth? Small adjustments based on real data make a bigger difference than optimising in theory.

Build a usage dashboard. If you're comfortable with a bit of development work, having your agent log token consumption by task and model to a database — then visualise it — gives you the observability you need to make intelligent optimisation decisions.

What a Sensible Setup Looks Like

Remote OpenClaw's recommended cost-optimised setup combines model routing, local heartbeats, n8n offloading, and session hygiene for an 80-90% reduction in token costs with no meaningful loss in capability., a reasonable OpenClaw setup looks something like this:

Primary model: Kimi K2 or Sonnet (not Opus) for everyday tasks
Developer sub-agent: GPT-4.1 or equivalent for coding, potentially using a subscription plan's included tokens
Heartbeat model: Ollama local model (free)
Recurring tasks: Handled by n8n where possible, only escalating to OpenClaw when real action is needed
Session hygiene: Regular /compact or /new to prevent context bloat
Lean config files: Reviewed monthly, free of redundant entries
Spending cap: API key daily limit set as a safety net

The combined effect of these changes is an 80–90% reduction in token costs compared to an unoptimised setup — with no meaningful loss in capability for the tasks that matter.

OpenClaw is powerful. It doesn't have to be ruinously expensive.

DEV Community

The Complete Guide to Reducing OpenClaw Token Costs (Up...

How Does OpenClaw Actually Spend Your Tokens?