DEV Community

Manoir Yantai
Manoir Yantai

Posted on

Persistent Memory for Any AI Agent: The Memory Sidecar Approach

We've all been there: you spend hours building context with an AI coding agent, solving a tricky bug, defining project conventions, and then — restart the session. It's a blank slate. All that hard-won context is gone.

Of course, there are workarounds. You can dump a massive system prompt with every session, or try to fine-tune a model, or build a custom RAG pipeline. But these approaches are either too heavy (fine-tuning), too rigid (static prompts), or too intrusive (patching the agent itself).

What if memory lived outside the agent, as a separate process that feeds relevant context automatically — without you or the agent having to do anything special?

That's the idea behind Memory Sidecar (v3.1.1) — a production-ready memory system that runs alongside your AI agent, whether it's Hermes, Claude Code, Cursor, Codex, or any tool that can read and write files.

How It Works

Memory Sidecar is a sidecar process: it lives in its own directory, watches the agent's session files, and builds persistent knowledge across three layers:

  • Hot Layer: A short-term memory tool (5KB cap) that keeps the most recent, relevant context at the top of the agent's mind.
  • Warm Layer: A PostgreSQL-backed "Hindsight" service that provides semantic retrieval over recent sessions.
  • Cold Layer: A knowledge graph (gbrain) combined with FTS5 full-text search for long-term recall of people, projects, and recurring problems.

When the agent starts a new session, the sidecar tiers these layers together, compiles them into a concise context block, and injects it into the agent's system prompt — or writes it to a file the agent can read. No agent code is patched, no custom integrations are required.

What It Actually Does

I've been dogfooding this with Hermes and Claude Code, and here's what it gives you:

  • Session persistence: Every meaningful interaction is archived. Restart your agent and it remembers what you were working on, what conventions you agreed on, and what blockers you hit.
  • Recall that scales: New sessions get recent context (hot), then semantically similar past sessions (warm), then facts from the knowledge graph (cold). The result feels like the agent actually has long-term memory.
  • Tracking important topics: People, project names, recurring issues — these get their own "dossier" in the knowledge graph, so the agent can answer questions like "what did we decide about the API design last week?"

Getting Started

You can install it from the GitHub repo. It's a Python 3.9+ application with optional PostgreSQL for the warm layer. The sidecar communicates with your agent via a shared data directory; you just point both to the same folder.

Here's the minimal setup:

git clone https://github.com/mage0535/hermes-memory-installer
cd hermes-memory-installer
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Then configure your agent to write sessions to the same data/ directory. The sidecar process will pick up new sessions, process them through the memory pipeline, and make recalled context available for the next session.

For a full step-by-step guide, check out HERMES_ONBOARDING.md in the repo — it covers integration with different agent types.

Architecture Philosophy

One thing I particularly like about this design is the layering. Hot, warm, and cold storage correspond to the way our own memory works: we remember the immediate past (hot), can search recent events (warm), and have a rich web of connections for long-term facts (cold). Each layer has different cost and speed characteristics, and the sidecar balances them automatically.

The system also includes new tooling in v3.1.1 — memory_watermark.py for automatic archiving when memory gets full, and memory_snapshot_backup.py for periodic backups. These are the kind of features you need in production but often forget to build yourself.

When (Not) to Use It

Memory Sidecar is great if you:

  • Use AI agents regularly and lose context between sessions
  • Want persistent memory without modifying agent internals
  • Are comfortable pointing another process at your session files

It might not be for you if:

  • You're looking for a fully managed cloud service (this is self-hosted)
  • Your agent doesn't support file-based session logging (though most do)
  • You need real-time, sub-millisecond retrieval across millions of sessions (then you might need a dedicated vector DB backend — which is on the roadmap)

Final Thoughts

I've found that having a real, persistent memory changes how I use AI agents. Instead of repeating instructions and context each time, I can pick up where I left off. The sidecar model keeps everything clean — the agent stays simple, and memory is a separate, upgradeable component.

If you're tired of resetting your agent's memory every session, give Memory Sidecar a try. It's open-source, MIT licensed, and ready for production.

Check it out on GitHub: github.com/mage0535/hermes-memory-installer

P.S. The architecture is fully documented in ARCHITECTURE.md — including how the three layers interact and how to extend with custom backends.

Top comments (0)