Give Your AI Agent Persistent Memory with a Sidecar Architecture

#ai #memory #opensource

Ever restarted an AI agent session only to realize it forgot everything you discussed? That blank slate is a productivity killer when you're working on complex projects. I've tried various approaches—long system prompts, RAG pipelines, fine-tuning—but they either hit token limits, require heavy infrastructure, or force you to modify the agent's internals.

That's why I built (and open-sourced) Memory Sidecar v3.1.1. It's a separate process that runs alongside your agent—Hermes, Claude Code, Cursor, Codex, whatever—and gives it a real memory without touching a single line of agent code.

How It Works

Instead of patching the agent, Memory Sidecar uses a sidecar pattern. The agent writes session data to a shared data directory; the sidecar consumes it, processes it, and injects relevant context back into the agent's system prompt on subsequent runs. It's designed for production use where you need continuity without redesigning your agent architecture.

The retrieval system is layered:

Hot Layer: a simple memory tool with a 5KB cap, providing instant recall of recent context.
Warm Layer: Hindsight PostgreSQL for mid-term memory that survives restarts.
Cold Layer: a knowledge graph (gbrain) combined with FTS5 full-text search for long-term knowledge and topic tracking.

This tiered approach means the agent always gets the right amount of context—recent conversations are fast, important topics are remembered indefinitely, and nothing is lost when you close the terminal.

What's New in v3.1.1

The latest release adds two features that make it more robust for continuous operation:

Memory Watermark Detection (memory_watermark.py) – automatically monitors memory usage and archives old sessions before you hit limits.
Periodic Snapshot Backups (memory_snapshot_backup.py) – creates scheduled snapshots so you never lose your agent's memory state.

Plus, the onboarding guide (HERMES_ONBOARDING.md) now includes a complete tool list for integrating your own agents, and all tokens have been moved to environment variables—no more hardcoded secrets.

Getting Started

Clone the repo, run the installer, and point it at your agent's session directory. That's it. The sidecar handles the rest. It's written in Python 3.9+ and licensed under MIT, so you can drop it into any project without friction.

git clone https://github.com/mage0535/hermes-memory-installer.git
cd hermes-memory-installer
pip install -r requirements.txt
# Configure your agent path in config.yaml
python memory-sidecar.py

I've been using this setup for months with multiple agents, and it's handled everything from long-running code reviews to multi-day research sessions. The architecture doc in the repo explains the full design if you want to dive deeper.

Why Not Just Use a Vector DB?

Vector databases are great for similarity search, but they don't capture structure—relationships between entities, event ordering, or which facts are outdated. Memory Sidecar's knowledge graph preserves these connections, and the layered retrieval means you're not hammering a vector index for every trivial lookup.

It's also deliberately stateless on the agent side. You can swap agents, run experiments, or restart without losing context. That flexibility has saved me hours of debugging "where did that conversation go?"

Check it out on GitHub: Memory Sidecar v3.1.1