If you've worked with AI agents like Claude Code, Cursor, or even custom Hermes agents, you've probably run into the same frustration: every new session starts with a blank slate. The agent has no idea what you discussed yesterday, what architecture decisions you made last week, or that recurring bug in the payment module. It's like talking to someone with goldfish memory.
Sure, you can paste context manually, or stuff everything into system prompts, but that quickly breaks down. Token limits, stale information, and the sheer effort of curating what matters make it impractical for real-world use.
I wanted something different: a memory system that runs alongside my agent, automatically archives sessions, and feeds relevant context back when needed. No patching the agent's internals, no custom plugins, just a sidecar process and a shared data directory.
That's how Memory Sidecar was born.
What It Actually Does
Memory Sidecar is a separate process that watches your agent's session outputs and builds structured long-term knowledge from them. It doesn't modify the agent at all. You point it at a directory where the agent writes conversations, and it does the rest.
Three core functions:
- Archives sessions to permanent knowledge – conversations are indexed and stored so restarting the agent doesn't lose them.
- Recalls what's relevant – uses a layered retrieval strategy: recent context first, then semantic search over embeddings, then knowledge graph lookups for deeper connections.
- Tracks important topics – people, projects, recurring problems get their own "dossier" that's automatically updated.
Instead of one monolithic memory store, there are three tiers:
- Hot layer – a live memory tool (5KB cap) for the immediate session context.
- Warm layer – a PostgreSQL backing store (called Hindsight) for mid-term recall.
- Cold layer – a graph database (gbrain) with FTS5 search for long-term knowledge.
When the agent needs context, the sidecar injects tiered results into its system prompt. You only get what's important, not a dump of everything.
Quick Setup
Installation is straightforward:
git clone https://github.com/mage0535/hermes-memory-installer
cd hermes-memory-installer
pip install -r requirements.txt
Then configure the sidecar to watch your agent's data directory. The memory service runs as a daemon, so you can start it and forget it.
The v3.1.1 release includes two new utilities:
-
memory_watermark.py– automatic detection and archiving when memory usage hits a threshold. -
memory_snapshot_backup.py– periodic snapshots for recovery.
If you're using a custom agent (not Hermes), check HERMES_ONBOARDING.md – it's a guide for integrating any agent.
When to Use It (and When Not To)
Good fit:
- Long-running development sessions where context carries over.
- Multi-project setups where each project has its own knowledge base.
- Teams that want a shared memory across agents.
Not a good fit:
- Simple one-off tasks where session history doesn't matter.
- Highly regulated environments where you can't store conversation content externally (though you could self‑host everything).
- If you need real-time memory synchronisation across agents – the sidecar polls on a schedule, not sub-second.
The Benefit in Practice
After running this for a few weeks, I noticed the agent's responses becoming more grounded. It would reference past decisions without me prompting. It remembered the tech stack of a project I started two weeks ago. It even caught a regression that resembled a previously fixed bug.
Was it perfect? No. The cold layer can be slow for very large knowledge bases, and the setup still requires some tinkering. But the core value is there: persistent memory that doesn't require rewriting your agent.
If you're tired of copy-pasting context, give it a try. The project is open source, MIT licensed, and the architecture is well-documented.
Check out the GitHub repo for full details: Memory Sidecar v3.1.1
Top comments (0)