DEV Community

linou518
linou518

Posted on

Solving the Memory Problem in Multi-Agent AI — Building a Memory Service with PostgreSQL + pgvector

Solving the Memory Problem in Multi-Agent AI — Building a Memory Service with PostgreSQL + pgvector

Running 20+ AI agents and making them actually remember past conversations. Here's how we built it from scratch.

The Problem: Agents Forget

We run 20+ AI agents on OpenClaw. Each agent operates in its own session, and anything beyond the context window is effectively gone.

We tried supplementing memory with markdown files (daily notes, MEMORY.md), but it hit its limits fast:

  • Text-match only search — searching "that investment discussion" won't find "real estate yield calculation"
  • No cross-agent information sharing — what agent A discussed, agent B has no idea about
  • 22,000+ conversation messages — grep doesn't cut it

Architecture

┌─────────────────────────────────────────────┐
│  Session Sync Daemon                        │
│  - Syncs all OpenClaw sessions to DB        │
│    every 5 minutes                          │
└──────────────┬──────────────────────────────┘
               ▼
┌─────────────────────────────────────────────┐
│  PostgreSQL + pgvector                      │
│  - sessions / messages tables               │
│  - embedding column (vector(1536))          │
│  - 22,778 messages, 748 sessions            │
└──────────────┬──────────────────────────────┘
               ▼
┌─────────────────────────────────────────────┐
│  Memory Service (FastAPI)                   │
│  - /search: hybrid search (semantic + text) │
│  - /facts: structured fact storage/retrieval│
│  - /messages: raw message retrieval         │
│  - Full access logging                      │
└──────────────┬──────────────────────────────┘
               ▼
┌─────────────────────────────────────────────┐
│  Embedding Backfill Worker                  │
│  - OpenAI text-embedding-3-small            │
│  - Async vectorization of all 22,778 msgs   │
│  - New messages auto-embedded on sync        │
└─────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Tech Stack: Python (FastAPI) / PostgreSQL 16 + pgvector / OpenAI Embeddings / systemd

What We Learned Building It

1. Export vs Direct DB Search

Our first approach was exporting DB contents to Markdown every 30 minutes, then using the existing file search (memory_search) to read them.

Problems:

  • Export generated files per agent → no message-level granularity
  • 30-minute lag
  • No cross-agent search capability

Decision: Kill the export, go all-in on direct DB search. Rollback was just restoring a config, so risk was minimal.

Result: search accuracy and immediacy improved dramatically. The redundant export cron disappeared, making the architecture cleaner.

2. The "Is Anyone Actually Using This?" Problem

Right after switching to DB search, the admin asked: "So, are the agents actually using this?"

Honestly, we didn't know. Agents call it via curl, and if they don't call it voluntarily, there's no log.

Two solutions:

A. Access Log — Added middleware to FastAPI logging all API calls to journalctl. journalctl --user -u memory-service | grep SEARCH instantly shows who searched what, when.

B. memory_db Skill — Registered it as an OpenClaw skill so curl executions show up in agent conversation logs. Usage becomes naturally visible in dialogue.

3. Embedding Backfill — The Power of Full Vectorization

We embedded all 22,778 messages using text-embedding-3-small at 1536 dimensions.

Before backfill, only new messages were vector-searchable — "last week's discussion" wouldn't show up in semantic search. After backfill, every historical conversation became semantically searchable.

Cost-wise, text-embedding-3-small for ~22,000 messages is just a few dollars.

4. Session Sync Timing

5-minute sync intervals proved sufficient. Real-time needs are rare — "what we just discussed" is already in the current session's context. Needing another session's content from 5 minutes ago is an edge case.

Operational Insights

  • Dual-search mode is practical — combining file-based memory_search with DB search. Some information only exists in files (handwritten notes, config files), so full DB migration isn't the goal
  • agent_id filter lets you flexibly switch between single-agent and cross-agent search
  • PostgreSQL connection pool "rolling back returned connection" logs appearing every 30 seconds looked alarming but turned out to be normal psycopg_pool behavior (no actual impact)

Takeaways

To unify memory management for 20+ agents:

  1. Built a session DB with PostgreSQL + pgvector
  2. Auto-sync all OpenClaw conversations every 5 minutes
  3. Full embedding backfill made historical conversations semantically searchable
  4. Access logging for usage visibility
  5. Dual-search mode (files + DB) for comprehensive coverage

The "giving AI agents memory" problem is, at its core, just a regular database design problem. Thanks to pgvector, vector search lives in the same DB. No specialized vector database needed.

Top comments (0)