Evan-dong

Posted on May 15 • Originally published at evolink.ai

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

#claude #api #ai #tutorial

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

The context window problem in long-running agents is familiar: by turn 20, you are paying for tool logs the agent does not need anymore. Truncation loses detail. Summarization compresses but also forgets.

Tencent Cloud open-sourced TencentDB Agent Memory (MIT license, May 2026), and it takes a different approach: offload the verbose stuff to local files, keep a Mermaid task graph in context, let the agent drill back in when it needs specifics.

The Architecture

Four memory layers, each traceable back to raw data:

L0 Conversation: raw dialogue + tool logs
L1 Atom: structured facts extracted every N conversations
L2 Scenario: aggregated solution patterns
L3 Persona: user behavior profiles built over time

The short-term trick: verbose tool output gets offloaded to refs/*.md files. In context, only a lightweight Mermaid graph remains. When the agent needs a specific output, it retrieves by node_id.

The Numbers

According to the project's benchmarks (long-horizon sessions, not isolated turns):

Task	Success Change	Token Change
WideSearch	33% → 50%	−61.38%
SWE-bench	58.4% → 64.2%	−33.09%
PersonaMem	48% → 76% accuracy	N/A

Biggest gains on WideSearch — makes sense, that is where context accumulates fastest. SWE-bench improvement is real but modest (+9.93%).

Important caveat: these are self-reported by the project team, not independently verified.

Quick Setup (OpenClaw)

openclaw plugins install @tencentdb-agent-memory/memory-tencentdb

// ~/.openclaw/openclaw.json
{
  "memory-tencentdb": {
    "enabled": true,
    "offload": { "enabled": true }
  }
}

openclaw gateway restart

That is it. SQLite + sqlite-vec by default, no external DB needed. The offload.enabled: true is what activates the Mermaid compression — without it you only get long-term memory.

Two Layers of Cost Optimization

Memory cuts tokens per call. But you are still paying the provider's per-token rate, and if the provider has an outage, the agent stalls.

If you route agent LLM calls through a gateway, you get a second optimization layer: model routing (pick the cheapest capable model per task), automatic fallback on 429/5xx, and a unified cost dashboard.

For Hermes, this means setting MODEL_BASE_URL to a gateway endpoint:

docker run -d --name hermes-memory \
  -e MODEL_BASE_URL=https://api.evolink.ai/v1 \
  -e MODEL_API_KEY=your-key \
  hermes-memory

Fewer tokens × lower cost per token = compounding savings.

More on unified API routing for multi-model apps →

Limitations

Only OpenClaw and Hermes are supported today
Offloading is off by default
SQLite is single-agent; concurrent access needs Tencent Cloud Vector DB backend
Benchmarks are project-reported

DEV Community

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

I Tried TencentDB Agent Memory — Here's What the Token Reduction Looks Like

The Architecture

The Numbers

Quick Setup (OpenClaw)

Two Layers of Cost Optimization

Limitations

Links

Top comments (0)