Give Your AI Agent Persistent Memory Without Modifying Its Code (Sidecar Approach)

#ai #memory #opensource

Frustrated by starting from scratch every session with your AI coding assistant? You describe your project structure, the architecture decisions, the recurring bugs — and next session, it's all gone. That's the core problem Memory Sidecar solves.

The Problem in Detail

Hermes, Claude Code, Cursor, Codex — these agentic tools produce amazing results within a session, but they lack persistent memory. Every new conversation is a blank slate. You lose momentum, re‑explain context, and cannot build on previous reasoning chains.

Why Typical Memory Solutions Don't Fit

Vector databases: Require the agent to have an embedding pipeline and explicit storage API.
Memory plugins: Often tightly coupled to specific frameworks or models.
Custom agent modification: Not possible if the agent is closed‑source, or you simply don't want to fork it.

The Sidecar Approach

Memory Sidecar runs as an independent process next to your agent. It watches the session files the agent writes, extracts relevant information, and builds a layered memory system. On the next session, it injects curated context directly into the agent's prompt — no patches, no forks.

Architecture: Three Layers of Memory

Layer	Storage	Purpose
Hot	In‑memory tool (5KB cap)	Recent interactions, fast retrieval
Warm	PostgreSQL (Hindsight)	Historical conversations, semantic search
Cold	Knowledge graph + FTS5 (gbrain)	Persistent facts: projects, people, patterns

The sidecar monitors state.db and session files, processes new content through all three tiers, and composes a context packet that becomes part of the agent's system prompt or tool definitions.

v3.1.1 What's New

Memory watermarking (memory_watermark.py) — automatically detects when the hot memory approaches capacity and triggers archival to the cold layer.
Snapshot backups (memory_snapshot_backup.py) — periodic backup of the entire memory state.
MCP bridge — agents supporting the Model Context Protocol can communicate directly.
Cleaner configuration — environment‑based token setup, no more hardcoded secrets.

When to Use This (and When to Skip)

Works great for:

CLI coding agents that write session logs.
Long‑running projects where context continuity saves time.
Teams sharing a memory store for consistent agent behaviour.

Not a fit for:

Real‑time latency‑critical tasks.
Agents with sufficient built‑in memory (e.g., Claude's projects feature).
Ephemeral sessions where memory adds no value.

Quick Start

git clone https://github.com/mage0535/hermes-memory-installer
cd hermes-memory-installer
pip install -r requirements.txt
python sidecar.py --watch /path/to/sessions

Then point your agent's session output to /path/to/sessions. More detailed guides live in HERMES_ONBOARDING.md. Architecture deep‑dive is in ARCHITECTURE.md (both included in the repo).

Why I Keep Using It

I've been running Memory Sidecar with Claude Code for weeks. It now remembers my project's coding conventions, ongoing bug hunts, and design discussions from two weeks ago. I no longer have to repeat myself. That's the kind of memory every agent should have.

Give it a try—Memory Sidecar on GitHub — and let me know how it works for your favourite agent.

DEV Community