Charles Wu for seekdb

Posted on Apr 27

I Raised a “Lobster” Assistant: It Burned Tokens, Not Electricity

#ai #agents #memory #llm

Many people treat AI memory as an afterthought. Here’s how to build a memory system that’s selective on injection, generous on retention — so your digital employee gets smarter without burning your budget.

Until I Figured Out One Thing: Good Memory Should Be Like a Wallet — Spend Wisely, Save Wisely

Week One: I was excited.

OpenClaw was running on my machine, working like a 24/7 digital employee: writing code, running tasks, replying to messages. I gave it a nickname, customized its workspace, and thought “raising” an AI meant talking to it more often.

Week Two: I started feeling guilty.

The numbers on my bill grew faster than I expected. The longer the context, the more history each request carried — latency and costs climbed together. Worse: feeding the model an entire “memory ledger” didn’t just cost more — it made it lose focus.

That’s when it hit me:

Running OpenClaw is, at its core, an exercise in “context economics.”
What belongs in context? Don’t spare a penny. What doesn’t? Every character is too much.

Running OpenClaw is, at its core, an exercise in “context economics.”
What belongs in context? Don’t spare a penny. What doesn’t? Every character is too much.

This article is for anyone using OpenClaw who doesn’t want to turn their lobster into a money-printing shredder: how to treat memory as a first-class citizen — save on injection, spend on retention.

You Think You’re Saving Money, But You’re Feeding the Model an All-You-Can-Eat Buffet

Here’s the trap many fall into: not saving where it counts.

Memory keeps piling up, with every turn of conversation trying to “bring everything along”
The result? Token explosion, skyrocketing costs, and duller responses

Conversely, some people don’t spend where they should:

Preferences, decisions, and constraints never get saved — they’re restated from scratch every time
Every new session, every restart means re-introducing yourself from zero

OpenClaw natively uses memory-core to treat Markdown files in the workspace as the source of truth — simple, readable, and right there with your project. The model reads and writes memory/ and MEMORY.md via tools—files as memory.

This approach works well. Until you start using it as a long-term employee: you want memory that’s structured, searchable, and decays naturally. You also want multiple agents and multiple users not to cross-contaminate.

What you need then isn’t “more Markdown files” — it’s a different logic entirely:

Extract key points at write-time, recall only the most relevant entries at read-time, and let older stuff fade when it should.

Four Words: Spend Wisely, Save Wisely

I now summarize my strategy in one sentence:

Save on context — don’t dump the entire text into the prompt. Instead, use semantic recall to inject only the top-k memories truly relevant to the current conversation. Pair this with intelligent extraction, and what’s stored in the database are “key points,” not transcripts.

Spend on assets — automatically distill worthwhile information into searchable, reusable memory: extraction, recall, forgetting curves, persistence, and visualized operations — in one pipeline.

In practice, that’s the OpenClaw Memory (PowerMem) Plugin: plug-and-play without modifying OpenClaw source code, connecting the memory slot to a separately deployed PowerMem instance (either an HTTP service or the local pmem CLI). OpenClaw doesn't run Python—it handles gateway, sessions, and tool dispatch; the heavy lifting happens in PowerMem and your database.

What Exactly Does It Save You On — and Spend On?

Save: Tokens Aren’t Saved, They’re “Selected”

PowerMem uses vector search: it looks at “whether this conversation is semantically similar,” not just “spread the entire diary on the table by timestamp.”

Enable autoRecall in the plugin, and relevant memories are automatically injected at the start of a session or turn—you don't need to manually "help the model flip through its notes."

The effects are twofold:

Shorter context → cheaper API calls
More accurate input for the model → more focused responses

Spend: Memory Is Data Infrastructure, Not a Chat Byproduct

When you truly treat OpenClaw as a 24/7 digital employee, memory can’t just live inside a single session.

Who said what, what decisions were made, what preferences and constraints exist — these should be persistent, searchable, and reusable. Like code and documentation, they deserve to be stored in a database, backed up, and governed for compliance.

On the PowerMem side: intelligent extraction (summarize and deduplicate at write-time, structured storage) + Ebbinghaus-style forgetting (important stuff stays longer, unimportant stuff fades slowly) + seekdb / OceanBase persistence + PowerMem Dashboard for visualized operations.

In one sentence: treat memory as a first-class citizen.

Four Capabilities, Four “Aha!” Moments

1. Intelligent Extraction: Don’t Use Chat Logs as Your Database

If you just dump entire conversations into storage, you quickly end up with an ocean of noise.

In PowerMem, configure your LLM + Embedding (like Qwen or OpenAI), and when writing with infer: true, it runs intelligent extraction. The plugin's inferOnAdd is enabled by default; you can also disable it per-entry and store raw text.

The result: your database holds searchable facts like “user preferences,” “project paths,” and “decisions made,” not lengthy copy-paste transcripts.

2. Forgetting Curve: A Memory Database Shouldn’t Be a Hoarding Dump

Humans forget — and digital memory can have a designed lifecycle too: important entries stay longer, less important ones decay over time, keeping your context uncluttered.

Run long enough, and your database becomes alive, not a junk drawer.

3. Multi-Agent / Multi-User: Same Backend, Separate Memories

Use userId and agentId as namespace isolation: a coding Agent and a documentation Agent keep their memories separate; a team sharing one PowerMem instance stays compartmentalized.

4. Semantic Search + Entry Limits: The Cure for “I Want to Include Everything”

Inject only top-k entries, with score thresholds — this is a fundamentally different philosophy from “paste the entire MEMORY.md into the prompt.”

How to Choose Versus the Default memory-core? One Table Clarifies

Neither approach is inherently better. Stick with memory-core if you prefer file-based truth. Choose PowerMem if you need “database-level memory + token savings + enterprise-grade operations.”

Architecture in One Line: OpenClaw Is the Pipes, PowerMem Is the Reservoir

OpenClaw: Gateway, sessions, tool dispatch; the memory slot is connected via plugin.
Plugin: Forwards store/retrieve/forget operations to PowerMem (HTTP or CLI).
PowerMem: Storage, retrieval, extraction, forgetting; data lives in your configured database (e.g., OceanBase, seekdb).

Start by getting PowerMem running (or installing the pmem CLI), then configure the plugin with baseUrl or mode: "cli". Your data stays in your environment.

The Knobs You’ll Actually Use

HTTP (default): Run powermem-server; fill in baseUrl in the plugin—ideal for multi-client, centralized deployment.
CLI: Use mode: "cli" to call the local pmem directly—ideal for lightweight single-machine setups.
autoCapture: At session end, send the current round’s content for extraction and storage (subject to length and entry limits, enabled by default).
autoRecall: Automatically inject relevant memories before session/turn start (enabled by default).

Agent tools typically include memory_recall / memory_store / memory_forget; on the CLI side, use pmem search, list, add.

You can automate everything relying only on capture + recall, or manually fine-tune for precise control.

Four Stories, More Honest Than a Feature List

1. Personal Assistant

You’ve said things like “I use VS Code” or “My notes are in ~/notes.” Before, you’d repeat this every time you started a new session. Now, preferences are extracted into memory. Next time you ask “write a script,” it naturally aligns with your habits.

2. Project Continuity

You once mentioned a repo name, responsibilities, and release method. A few days later, a new session asks “how do I release?” Semantic search retrieves that entry — instead of “Please tell me the project name first.”

3. Multi-Agent Division

Two Agents — one for coding, one for docs — keep their memories separate. One remembers stacks and paths; the other remembers terminology and readers — like separate disk partitions.

4. Manual Correction

Use the CLI (pmem add) or Agent tools (memory_store) to proactively record something; use memory_forget to clear outdated entries, keeping the database synced with your current focus.

Is This for You? Five Self-Check Questions

You’re already using OpenClaw and tired of re-introducing yourself from scratch every time.
You want to save Tokens: inject only relevant memories, not full text.
You treat your lobster as a long-term employee and are willing to manage memory as infrastructure.
You need extraction + recall + decay + database + dashboard as a complete pipeline.
You can accept running PowerMem separately (or using the CLI) and configuring LLM/Embedding in .env (if you want intelligent extraction).

Installation: Four Steps, from “Can Connect” to “Can Search”

# Step 1: Start PowerMem
pip install powermem
powermem server
# Or use Docker. Intelligent extraction requires LLM + Embedding configured in .env.

# Step 2: Install the Plugin
openclaw plugins install memory-powermem
# Alternatively, use the official Skill marketplace for guided installation.

# Step 3: Modify openclaw.json
# Point plugins.slots.memory to memory-powermem
# Configure baseUrl or CLI mode

# Step 4: Verify
openclaw memory status
# Then do a test run of add / search

For details, refer to the repository’s README.

Final Thoughts

Code is plumbing.

What flows through the plumbing should be increasingly accurate memory — not increasingly long rambling.

If you’re also raising an OpenClaw lobster, may you: save where it counts, so tokens aren’t burned in vain; spend where it matters, so insights aren’t lost to the wind.

Have you experimented with AI memory systems? What’s your approach to balancing context and cost? Share your experience in the comments.

DEV Community