DEV Community

Evan-dong
Evan-dong

Posted on • Originally published at evolink.ai

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

TencentDB Agent Memory four-tier architecture

The context window problem in long-running agents is familiar: by turn 20, you are paying for tool logs the agent does not need anymore. Truncation loses detail. Summarization compresses but also forgets.

Tencent Cloud open-sourced TencentDB Agent Memory (MIT license, May 2026), and it takes a different approach: offload the verbose stuff to local files, keep a Mermaid task graph in context, let the agent drill back in when it needs specifics.

The Architecture

Four memory layers, each traceable back to raw data:

  • L0 Conversation: raw dialogue + tool logs
  • L1 Atom: structured facts extracted every N conversations
  • L2 Scenario: aggregated solution patterns
  • L3 Persona: user behavior profiles built over time

The short-term trick: verbose tool output gets offloaded to refs/*.md files. In context, only a lightweight Mermaid graph remains. When the agent needs a specific output, it retrieves by node_id.

The Numbers

According to the project's benchmarks (long-horizon sessions, not isolated turns):

Task Success Change Token Change
WideSearch 33% → 50% −61.38%
SWE-bench 58.4% → 64.2% −33.09%
PersonaMem 48% → 76% accuracy N/A

Biggest gains on WideSearch — makes sense, that is where context accumulates fastest. SWE-bench improvement is real but modest (+9.93%).

Important caveat: these are self-reported by the project team, not independently verified.

Quick Setup (OpenClaw)

openclaw plugins install @tencentdb-agent-memory/memory-tencentdb
Enter fullscreen mode Exit fullscreen mode
// ~/.openclaw/openclaw.json
{
  "memory-tencentdb": {
    "enabled": true,
    "offload": { "enabled": true }
  }
}
Enter fullscreen mode Exit fullscreen mode
openclaw gateway restart
Enter fullscreen mode Exit fullscreen mode

That is it. SQLite + sqlite-vec by default, no external DB needed. The offload.enabled: true is what activates the Mermaid compression — without it you only get long-term memory.

Two Layers of Cost Optimization

Memory cuts tokens per call. But you are still paying the provider's per-token rate, and if the provider has an outage, the agent stalls.

If you route agent LLM calls through a gateway, you get a second optimization layer: model routing (pick the cheapest capable model per task), automatic fallback on 429/5xx, and a unified cost dashboard.

For Hermes, this means setting MODEL_BASE_URL to a gateway endpoint:

docker run -d --name hermes-memory \
  -e MODEL_BASE_URL=https://api.evolink.ai/v1 \
  -e MODEL_API_KEY=your-key \
  hermes-memory
Enter fullscreen mode Exit fullscreen mode

Fewer tokens × lower cost per token = compounding savings.

More on unified API routing for multi-model apps →

Limitations

  • Only OpenClaw and Hermes are supported today
  • Offloading is off by default
  • SQLite is single-agent; concurrent access needs Tencent Cloud Vector DB backend
  • Benchmarks are project-reported

Links

Top comments (0)