DEV Community

lofder.issac
lofder.issac

Posted on

I Killed My OpenClaw — Built the Memory, the Gateway, the Patches. Then the Token Bill Arrived.

What I Actually Built

Between March and April 2026, I shipped 3 projects around the OpenClaw ecosystem. Not forks. Original work.

1. Engram — Scope-Aware Memory for Multi-Agent AI

Repo: lofder/Engram

The problem with OpenClaw's memory was simple: it's file-driven. You write a SOUL.md, you manually curate skills as markdown files, and the AI loads everything into context every single time. More memories = more tokens = more money. And nothing gets cleaned up automatically.

I built Engram as the fix. It's a full memory architecture powered by Mem0 + Qdrant + MCP:

  • Scoped memoryglobal, group:project-x, dm, agent:coder. Different memory pools for different contexts. Your coding preferences don't pollute your email agent's memory.
  • 7 memory typespreference, fact, procedure, lesson, decision, task_log, knowledge. Not just "remember this" — structured categories that the retrieval system can filter on.
  • Self-cleaning — duplicates merge automatically. Old task logs get summarized into compact knowledge. Stale entries fade. The AI manages its own memory instead of you maintaining files.
  • Forgetting logic — this is the part nobody else does. Remembering is easy. Knowing what to forget is the hard problem. Engram tracks memory age, access frequency, and relevance decay. A task log from 3 weeks ago that was never recalled again? It gets compressed into a one-line summary, then eventually dropped. A preference you stated on day 1 that gets recalled every session? It stays forever. The AI doesn't just accumulate — it curates. Without this, every memory system eventually drowns in its own context, and you're back to paying for 12K tokens of stale history on every request.
  • Trust scoringhigh / medium / low. User-stated preferences are high trust. AI inferences are low trust. When memories conflict, trust breaks the tie.
  • 5 MCP toolsmem0_add, mem0_recall, mem0_search, mem0_delete, mem0_compact. Any MCP-compatible client can call them.

The design philosophy:

OpenClaw:  You write rules.md → AI reads it → you update rules.md → repeat forever
Engram:    AI remembers on its own → compresses over time → you never maintain a file
Enter fullscreen mode Exit fullscreen mode

When Hermes Agent launched with "self-improving procedural memory" as its headline feature, I had a moment. Because Engram already did this — and more. Hermes stores skills as markdown files and uses LLM summarization for compression. It remembers, but it doesn't forget. There's no decay, no lifecycle, no "this memory is 3 weeks old and was never useful — drop it." Engram has typed memory categories, vector-based semantic retrieval, trust scoring, scope isolation, forgetting logic, and automatic lifecycle management.

But Engram ran on OpenClaw. And OpenClaw ran on tokens. And tokens ran on money I didn't have.

2. Durable Gateway Runtime — Multi-Channel Architecture

Repo: lofder/durable-gateway-runtime

OpenClaw's gateway is its core — the long-running Node.js process that connects WhatsApp, Telegram, Slack, etc. to the AI. But the architecture docs were scattered, and the execution model had gaps when you tried to scale beyond a single instance.

I wrote a full architecture document for a multi-channel gateway and execution model:

  • Ingress normalization — how to standardize messages from different platforms into a unified format
  • Execution skeleton — the task queue, context assembly, and tool execution pipeline
  • State durability — how to persist conversation state across restarts without losing context
  • Channel routing — how to route different groups/users to isolated agent instances

This was meant to be the "how to actually run OpenClaw in production" guide. Not just npm start on your laptop — real multi-tenant, crash-recoverable deployment.

I never finished the implementation. The architecture docs are public. The code is experimental. The reason I stopped? Same as everything else: tokens.

3. Gateway Stability Patch — Production Hotfix Toolkit

Repo: lofder/openclaw-gateway-stability-patch

This one came from pain. I was running OpenClaw with multiple channels, and the gateway kept crashing. WebSocket handshake races. Connect-challenge timeout drift. Retryable pre-connect closes that weren't actually being retried.

So I built a proper overlay toolkit:

  • Rule-based patches — configurable handshake timeout, connect-challenge timeout, bounded retry for loopback failures
  • apply/check/rollback CLI — not "edit the file and hope." A proper workflow with backups, manifests, and integrity checks
  • Version-strict — refuses to patch if the runtime version doesn't match. No silent breakage
  • Idempotent — run apply twice, get the same result. No duplicate patches stacking up

Pure Python, zero dependencies, MIT licensed. It's the kind of boring infrastructure work that nobody stars on GitHub but everybody needs in production.


The Token Bill That Killed It All

Let me tell you how it feels to watch money evaporate.

You build something you're proud of. Engram is humming. The gateway is stable (thanks to your own patches). Three channels are connected. You go to bed thinking "this is finally working."

You wake up. Check the API dashboard.

$14.37 overnight. While you slept.

Your agent was alive. Heartbeating. Checking for tasks every 5 minutes — 288 API calls through the night. Each one loading the full conversation history + system prompt + all loaded skills + Engram memories into context. Even when there was literally nothing to do, each "nothing to do" cost tokens. Your AI was awake at 3am, spending your money to confirm that nobody had messaged it.

That was the moment I started doing math I didn't want to do.

Why OpenClaw Eats Tokens Like It's Starving

OpenClaw's architecture is fundamentally, structurally, by design token-hungry. It's not a bug. It's how it works.

Context loading — the silent killer. Every single request ships the FULL conversation history + system prompt + loaded skills + memory. Not a summary. Not the relevant parts. Everything. A 20-message conversation with 3 loaded skills hits 8K-12K tokens per request — just for context, before the AI thinks a single thought. And context tokens count on every request. So message #21 pays for all 20 previous messages again. And again. And again.

Heartbeat — paying to breathe. OpenClaw checks for scheduled tasks periodically. Each heartbeat is a full API call with full context loading. Even "nothing to do" costs tokens. At the default 5-minute interval:

288 heartbeats/day × 2K tokens (minimum context) = 576,000 tokens/day
                                                  = just to exist
                                                  = ~$1.73/day on Sonnet
                                                  = $52/month for NOTHING
Enter fullscreen mode Exit fullscreen mode

That's $52/month before you even talk to it. Just for the privilege of having it sit there, awake.

Tool chains — compound interest, but bad. A simple task like "check my email and summarize" involves: read email (tool call + response tokens) → parse content (inference) → summarize (inference) → store to Engram (tool call) → compose response (inference). That's 4-5 inference rounds. Each round loads the growing context. One email check = ~15K tokens. Do that 3 times a day and you're burning 45K tokens on email alone.

Model pricing — the real knife.

Claude Opus:    $15/M input, $75/M output
Claude Sonnet:  $3/M input, $15/M output  
Claude Haiku:   $0.25/M input, $1.25/M output
Enter fullscreen mode Exit fullscreen mode

OpenClaw defaults to the best model available. If you have Opus access, it uses Opus. For everything. Including heartbeats. Including "nothing to do." I watched $0.47 disappear on a single heartbeat that concluded "no pending tasks." Forty-seven cents to think about nothing.

My Actual Spend

Week 1:  Just exploring, light usage              $12
Week 2:  Added Engram + 3 channels, getting real   $47
Week 3:  Gateway stability testing, lots of restarts $38
Week 4:  Desperate optimization, model fallback     $29
──────────────────────────────────────────────────────
Total:   4 weeks                                   $126
Enter fullscreen mode Exit fullscreen mode

$126. For a personal AI assistant. That crashed. Regularly. And needed me to SSH in and restart it.

Let me put that in perspective:

  • $126 = 8 months of Netflix
  • $126 = my phone bill for 3 months
  • $126 = ChatGPT Plus for 6 months (which just works, no crashing)

And this was the optimized version. I had already:

  • Switched heartbeat to 30-minute intervals
  • Set up model fallback (Haiku for simple, Sonnet for complex)
  • Pruned context aggressively
  • Disabled 2 of 5 skills to reduce context size

The unoptimized version? People report $200-1000+/month. There's a famous post on Zhihu: "25 sentences cost nearly $20." That's not rage-bait. That's Tuesday with OpenClaw on Opus.

The Moment

I was sitting at my desk. It was a Wednesday afternoon. I opened the Anthropic billing dashboard and saw the week-to-date: $31.40. For 4 days. My bank account had $847 in it.

I did the math. At this rate, OpenClaw would eat 15% of my remaining savings in a month. For a side project. That I was building for fun.

I opened the terminal. I typed:

/stop
Enter fullscreen mode Exit fullscreen mode

Then:

docker stop openclaw && docker rm openclaw
Enter fullscreen mode Exit fullscreen mode

Then I closed the tab and went for a walk.

That walk lasted about an hour. I came back and started thinking about what I could build that would cost $0 to run.


Hermes Agent: The Thing I Almost Built

Two weeks after I killed my OpenClaw, Hermes Agent dropped. And the tech community went wild.

"Self-improving AI agent!" "Procedural memory!" "Model-agnostic!" "The OpenClaw killer!"

I read the architecture docs. I looked at the feature list. And I felt... recognition.

Here's what Hermes Agent launched with, mapped to what I already had:

Hermes Agent (launched) My stack (built earlier)
Procedural memory — auto-generates skills from experience Engram — 7 memory types, trust scoring, auto-compression, scope isolation
Session history with FTS5 search Engram — Qdrant vector search + Mem0 semantic retrieval
Model-agnostic runtime Was already using model fallback in my OpenClaw config
CLI + TUI + messaging platforms Durable gateway runtime — multi-channel architecture with ingress normalization
Pre-execution security scanner Gateway stability patch — version-strict apply/check/rollback
Cron-based scheduled tasks OpenClaw heartbeat (the thing that ate my tokens)

I'm not saying Hermes copied anything. They didn't. But the problems they're solving? I was already there. The difference: Hermes is backed by Nous Research with (presumably) a budget for running their own models. I was a solo developer paying retail API prices.

Where Hermes genuinely does better:

  • Python-native (easier to hack on for ML people)
  • The do-learn-improve loop is cleaner than my separate Engram + OpenClaw integration
  • Zero telemetry by default is a strong stance
  • Better documentation and community

Where my design was ahead:

  • Forgetting logic — this is the big one. Hermes remembers. It doesn't forget. Every memory system that only accumulates eventually collapses under its own weight — context bloat, token waste, contradictory old entries polluting new decisions. Engram tracks age, access frequency, and relevance decay. It knows when to compress, when to merge, and when to let go. Knowing what to forget is harder than knowing what to remember, and Hermes doesn't even try.
  • Engram's scoped memory (global / group / dm / agent) is more granular than Hermes' flat note system
  • Trust scoring on memories (high/medium/low) — Hermes doesn't distinguish between user-stated facts and AI inferences
  • The gateway stability patch addresses real production issues that Hermes hasn't faced yet (because it's still young)

The honest truth: architecture doesn't matter if you can't afford to run it.


The Pivot: From Token Burn to Zero-Cost MCP Tooling

That walk after docker rm openclaw changed how I think about building AI tools.

The question wasn't "what's the coolest thing I can build?" anymore. It was: "what can I build that doesn't need me to feed it money every month just to exist?"

The answer was MCP tools. Not agents. Not platforms. Tools.

The insight: I don't need to build the agent. Claude is an agent. Cursor is an agent. ChatGPT is an agent. Millions of people already pay for these. What they need is tools that plug in — domain-specific logic that runs locally, costs nothing, and returns results in milliseconds.

That's how DSers MCP Product happened — 12 tools and 4 prompts for dropshipping automation. And sku-matcher — a pure-algorithm SKU matching engine that runs in milliseconds with zero model dependency.

The irony is almost painful:

OpenClaw era:
  Built: memory system + gateway + patches
  Users: 0
  Cost:  $126/month
  Status: dead

MCP era:
  Built: dsers-mcp-product + sku-matcher
  Users: 25 stars, 3900+ npm downloads
  Cost:  $0/month
  Status: growing
Enter fullscreen mode Exit fullscreen mode

My $0/month MCP tools, running inside Claude or Cursor as the host agent, deliver more practical value than my entire $126/month OpenClaw setup ever did. The host agent handles conversation, memory, and orchestration. My tools just do the domain-specific work. No token bill. No gateway crashes. No heartbeat burning money at 3am while I sleep.

The lesson: don't build the platform. Build the tool. Let someone else's platform run it. Let someone else pay the token bill.


What I'd Tell Past Me

  1. The "self-hosted AI agent" dream is a tax on enthusiasm. Everyone who sets up OpenClaw feels like Tony Stark for the first 48 hours. Then the invoice arrives. Until local models reach cloud API quality for complex tasks, "self-hosted" just means "you pay retail token prices with no negotiating power." OpenClaw + Claude Opus is a $100+/month commitment for basic utility. That's a subscription you didn't sign up for, to a service that crashes.

  2. Beautiful architecture is worthless if you can't keep the lights on. Engram's design is solid. I still believe scoped, typed, trust-scored memory is the right approach. But nobody cares about your memory architecture when you're explaining to your bank why there's a $47 charge from "Anthropic PBC" this week.

  3. If you can't afford the runtime, the architecture doesn't ship. I still think about Engram's design. It's good work. But good work sitting in a stopped container is just a GitHub repo with a nice README.


The Projects Live On (Sort Of)

Everything is still public on GitHub:

  • Engram — the memory architecture. If someone wants to build on it, the design is there.
  • durable-gateway-runtime — the architecture docs. Good reading material if you're designing multi-channel AI systems.
  • openclaw-gateway-stability-patch — the stability toolkit. Still useful if you're running OpenClaw in production and hitting WebSocket issues.

And what came after:

  • dsers-mcp-product — dropshipping automation via MCP. 25 stars, 3900+ npm downloads, zero token cost. The thing that actually works.
  • sku-matcher — SKU variant matching engine. Pure algorithm, no model, millisecond response. Being integrated into DSers MCP as automated supplier replacement.

Honestly?

I miss it.

I miss having an AI that knew me across sessions. That remembered I prefer TypeScript over Python, that I hate verbose logging, that my deployment always goes to the staging branch first. Engram made the AI feel like a colleague who'd been working with me for months, not a stranger I had to re-brief every morning.

I miss the gateway routing — different channels for different contexts, the Telegram bot for quick notes, the Slack integration for work stuff. It felt like having a real assistant with a real presence, not just a text box I paste into.

I don't miss the crashes. I don't miss the $14 overnight surprise. I definitely don't miss the $0.47 heartbeat that thought about nothing.

But the thing is — I didn't kill OpenClaw because I wanted to. I killed it because I couldn't afford it. There's a difference. If someone handed me an API key with unlimited tokens tomorrow, I'd docker run openclaw before they finished the sentence. Engram is still the best memory architecture I've designed. The gateway runtime is still the most thoughtful multi-channel AI system I've documented. The stability patches still solve real problems.

I just can't pay $126/month for the privilege of running them.

So this article is partly a portfolio piece, partly a technical comparison, and partly me venting into the void. If you're thinking about building on OpenClaw, or Hermes, or any self-hosted AI agent — read the pricing page before you read the architecture docs. Calculate the monthly token cost before you write the first line of code. The "self-hosted AI agent" dream is real. The bill is also real.

And if you're from Nous Research reading this — Hermes is good. But add forgetting logic. Your users will thank you in 3 months when their memory stores aren't 50,000 entries of stale garbage.

I'll be here, building $0/month MCP tools, waiting for the day token prices drop enough that I can bring my OpenClaw back to life.


Related reading:

Top comments (0)