Sébastien Conejo

Posted on Mar 3 • Originally published at clawsnewsletter.substack.com

How to stop burning money on OpenClaw

#openclaw #ai #infrastructure #llm

OpenClaw is one of the fastest-growing open-source projects in recent history. 230,000 GitHub stars, 116,000 Discord members, 2 million visitors per week. All of that in two months. People are running personal AI agents on their Mac Minis and cloud servers. It works, and it is genuinely useful.

Like any major shift in how we use technology, it comes with constraints. After speaking with over a hundred OpenClaw users, cost is the topic that comes up in almost every conversation. Someone sets up their agent, starts using it daily, and two weeks later discovers they have spent $254 on API tokens. Another spent $800 in a month. These are not power users pushing the limits. These are normal setups with normal usage.

Where the money goes

Your agent sends every request to your primary model. A heartbeat check, a calendar lookup, a simple web search. If your primary model is Opus 4.6, all of it goes through the most expensive endpoint available.

Your costs stack up from four main sources:

System context - SOUL.md loads into the prompt on every call. Other bootstrap files like AGENTS.md contribute depending on what the agent needs. Even with memory pulled in through search rather than loaded raw, the base system context still adds up. On a typical setup, you are looking at thousands of tokens billed on every single request.
Conversation history - Your history grows with every exchange. After a few hours of active use, a session can carry a large amount of tokens. The entire history tags along with every new request.
Heartbeat checks - The heartbeat runs in the background every 30 minutes by default. Each check is a full API call with all of the above included.
Model choice - Without routing, every request is sent to a single primary model, whether the task is simple or complex. That prevents cost optimization.

One user woke up to an unexpected $141 bill overnight because the heartbeat was hitting the wrong model.

Put all of this together on an unoptimized Opus setup and you can easily spend more per day than most people expect to pay in a month.

Use one agent with skills instead of many agents

This is the highest-impact change you can make and almost nobody talks about it.

A lot of users build multi-agent setups. One agent for writing, one for research, one for coding, one to coordinate. Each agent runs as a separate instance with its own memory, its own context, and its own configuration files. Every handoff between agents burns tokens. Each agent adds its own fixed context overhead, so costs scale with every new instance you spin up.

OpenClaw has a built-in alternative. A skill is a markdown file that gives your agent a new capability without creating a new instance. Same brain, same memory, same context. One user went from spending hundreds per week on a multi-agent setup to $90 per month with a single agent and a dozen skills. The quality went up because context stopped getting lost between handoffs.

Keep one main agent. Give it a skill for each type of work. Only spin up a sub-agent for background tasks that take several minutes and need to run in parallel.

Route each task to the right model

The majority of what your agent does is simple. Status checks, message formatting, basic lookups. These do not need a frontier model. Only a small fraction of requests actually benefits from premium reasoning.

Without routing, all of it hits your most expensive endpoint by default. One deployment tracked their costs before and after implementing routing and went from $150 per month to $35. Another went from $347 to $68. Smart routing tools can reduce costs by 70 percent on average.

OpenClaw does not ship with a built-in routing engine, so you need an external tool to make this work. Manifest handles this out of the box. It classifies each request and routes it to the right model automatically, so your heartbeats and simple lookups go to Haiku while complex reasoning still hits Opus. That alone cuts your bill dramatically without any manual config per task.

If you prefer a DIY approach, you can set up multiple model configs or write a routing skill yourself, but it takes more effort to get right.

Cache what does not change

Your SOUL.md, MEMORY.md, and system instructions are the same from one call to the next. Without caching, the provider processes all of those tokens from scratch on every single request. You pay full price every time for content that has not changed.

Prompt caching is a capability on the provider side. Anthropic offers an explicit prompt caching mechanism with a documented TTL where cached reads cost significantly less than fresh processing. Other providers handle caching differently or automatically, so the details depend on which model you are using. The point is the same: static tokens that hit warm cache cost less than tokens processed from scratch.

This is where the heartbeat becomes relevant. If your heartbeat fires often enough to keep the provider's cache warm between calls, every check reuses the cached system context instead of reprocessing it from zero. Cache TTLs vary by provider and configuration. Anthropic's standard TTL is around 5 minutes, with longer windows available depending on the setup. Community members have found that aligning the heartbeat interval just under whichever TTL you are working with keeps the cache alive. Combine that with routing your heartbeat to a cheap model and each background check costs a fraction of what it would on a cold Opus call.

The key principle is simple. Make sure your static content (system instructions, bootstrap files) sits at the beginning of your prompt and variable content comes at the end. That structure maximizes what the provider can cache. One user documented a drop from $720 to $72 per month primarily through this approach.

Shrink your context window

Every message you send includes your full conversation history. After a few hours that history alone can cost more than the actual answer. Three things you can do about it.

Start new conversations often. This is the easiest win. Instead of running one conversation for an entire day, start a fresh one every couple of hours. Your agent keeps its long-term memory across conversations but drops the accumulated back-and-forth. Context resets to your bootstrap files only.

Clean up your SOUL.md. Everything in that file loads on every single call. If you have task-specific instructions sitting next to your personality rules, you are paying for all of it every time. Move the specialized parts into skills. They only load when the agent actually needs them.

Optimize how memory loads into context. OpenClaw uses memory_search to pull relevant memories into your prompt, not the raw file. But the more memories accumulate over weeks of use, the more context those searches can return. Configuring the QMD backend and tuning what gets retrieved keeps that footprint tight. Some community members have built structured memory layers on top of this and cut their base context to a fraction of what it used to be.

Run a local model for the simple stuff

Running a model on your own hardware eliminates API costs for the tasks that do not need a cloud model.

You pay for hardware once. After that, every inference is free. For heartbeats, classification, and routine lookups, local models are more than capable.

The popular choice right now is Qwen 3 32B. On an RTX 4090 it runs at 40+ tokens per second. A Mac Mini running 24/7 handles the lightweight workload while cloud models only get called for complex reasoning.

Ollama makes the integration simple. Install, pull the model, point your OpenClaw config at the local endpoint for specific task types. It works through an OpenAI-compatible HTTP endpoint.

Track your costs daily

Every user who cut their bill says the same thing. The fix was not a specific technique. It was seeing where the money went.

Checking your bill once a month hides everything. You miss the day a cron job misfired. You miss the skill that routes to Opus when it should hit Haiku.

Use an observability tool that shows you per-prompt, per-model cost breakdowns. When you can see exactly which request went to which model and what it cost, problems become obvious. The fixes usually take minutes once you see the data.

Some routing tools offer real-time tracking with daily budgets and alerts so you catch problems before they compound. Your provider dashboard already tracks spending, but the granularity varies.

Where to start

Start with visibility. Set up an observability tool so you can see which prompts cost what and which models they hit. You cannot optimize what you cannot measure.

If you are running multiple agents, switch to one agent with skills. That is the highest return for the least effort.

Route your heartbeat to a cheap model. This alone makes a noticeable difference on a 24/7 agent.

Enable prompt caching. It takes minutes to set up.

Keep your context lean. Clean up your SOUL.md, start new conversations regularly, and switch your memory to vector search.

Add a local model if you have the hardware. It handles heartbeats and simple tasks at zero marginal cost.

Based on what we've observed across multiple OpenClaw deployments, applying these changes can reduce monthly costs by five.

If you're running OpenClaw agents and want to keep costs under control, we built Manifest for that. It's free, open source, and gives you real-time cost tracking with smart model routing. Feedback is welcome, we're building this with the community.