I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like
The context window problem in long-running agents is familiar: by turn 20, you are paying for tool logs the agent does not need anymore. Truncation loses detail. Summarization compresses but also forgets.
Tencent Cloud open-sourced TencentDB Agent Memory (MIT license, May 2026), and it takes a different approach: offload the verbose stuff to local files, keep a Mermaid task graph in context, let the agent drill back in when it needs specifics.
The Architecture
Four memory layers, each traceable back to raw data:
- L0 Conversation: raw dialogue + tool logs
- L1 Atom: structured facts extracted every N conversations
- L2 Scenario: aggregated solution patterns
- L3 Persona: user behavior profiles built over time
The short-term trick: verbose tool output gets offloaded to refs/*.md files. In context, only a lightweight Mermaid graph remains. When the agent needs a specific output, it retrieves by node_id.
The Numbers
According to the project's benchmarks (long-horizon sessions, not isolated turns):
| Task | Success Change | Token Change |
|---|---|---|
| WideSearch | 33% → 50% | −61.38% |
| SWE-bench | 58.4% → 64.2% | −33.09% |
| PersonaMem | 48% → 76% accuracy | N/A |
Biggest gains on WideSearch — makes sense, that is where context accumulates fastest. SWE-bench improvement is real but modest (+9.93%).
Important caveat: these are self-reported by the project team, not independently verified.
Quick Setup (OpenClaw)
openclaw plugins install @tencentdb-agent-memory/memory-tencentdb
// ~/.openclaw/openclaw.json
{
"memory-tencentdb": {
"enabled": true,
"offload": { "enabled": true }
}
}
openclaw gateway restart
That is it. SQLite + sqlite-vec by default, no external DB needed. The offload.enabled: true is what activates the Mermaid compression — without it you only get long-term memory.
Two Layers of Cost Optimization
Memory cuts tokens per call. But you are still paying the provider's per-token rate, and if the provider has an outage, the agent stalls.
If you route agent LLM calls through a gateway, you get a second optimization layer: model routing (pick the cheapest capable model per task), automatic fallback on 429/5xx, and a unified cost dashboard.
For Hermes, this means setting MODEL_BASE_URL to a gateway endpoint:
docker run -d --name hermes-memory \
-e MODEL_BASE_URL=https://api.evolink.ai/v1 \
-e MODEL_API_KEY=your-key \
hermes-memory
Fewer tokens × lower cost per token = compounding savings.
More on unified API routing for multi-model apps →
Limitations
- Only OpenClaw and Hermes are supported today
- Offloading is off by default
- SQLite is single-agent; concurrent access needs Tencent Cloud Vector DB backend
- Benchmarks are project-reported

Top comments (0)