If you've ever used Claude Code, Cursor, or a custom AI agent to build something non-trivial, you've felt the pain: every new session starts blank. The agent can't remember what we discussed two conversations ago, the project context it built up, or the bug fix we iterated on yesterday.
Sure, some tools have built-in memory features — usually a conversation log or a vector store you must explicitly manage. But they’re either locked into a specific agent, require patching agent internals, or assume you want to reinvent the persistence layer yourself. That's not practical for production workflows.
I wanted something that works with any agent — including ones I might switch to next month — without touching its code. So I built Memory Sidecar.
How It Works
Memory Sidecar is a separate process that sits alongside your agent. It watches the agent’s session output (files, database) and builds a persistent knowledge base. When the agent starts a new session, the sidecar injects relevant context back into its system prompt. The agent never knows it has a memory — it just gets smarter prompts.
The architecture is straightforward and intentional:
- Hot Layer: Keeps the most recent ~5KB of conversation as a rolling context. Fast, cheap, and always available for immediate recall.
- Warm Layer: A PostgreSQL-backed store (Hindsight) that archives complete sessions for later review and semantic search.
- Cold Layer: A combination of vector search (embeddings), full-text search (FTS5), and a lightweight knowledge graph (gbrain) for long-term, cross-session retrieval of topics, decisions, and project patterns.
When the agent needs context, the sidecar runs a tiered retrieval: first check hot, then perform semantic + keyword search on cold, and optionally pull warm archives for deeper dives. Everything is merged into a concise prompt injection that respects token budgets.
What v3.1.0 Changed
Version 3.1.0 cleaned up the design: we removed a legacy Docker bridge layer that held stale records and added dependency for zero value. The cold layer now uses a single integrated engine (gbrain) that does vector, keyword, and graph lookups. The result is simpler to deploy — pure Python 3.9+ with PostgreSQL — and faster to query.
Getting Started
You don’t need to modify your agent’s code. Point Memory Sidecar at the agent’s data directory (where it writes session logs or checkpoints). For example:
# Install
pip install memory-sidecar
# Start the sidecar, watching a directory
memory-sidecar --watch /path/to/agent/sessions --postgres-uri postgresql://localhost/hindsight
The sidecar will process existing sessions, index them, and begin injecting context into the agent on the next restart. The agent gets a system prompt prefix like:
[Recalled context: previous session discussed API redesign for project X.
Key decision: use FastAPI over Flask for async support.
Open issue: rate limiting not yet implemented.]
That's it. No SDKs, no agent plugins, no patched forks.
When It Works Well (and When It Doesn't)
This is great for:
- Long-running projects where context builds over weeks
- Teams sharing an agent — everyone benefits from accumulated knowledge
- Agents that produce structured session logs (Hermes, Claude Code, Codex, custom scripts)
It's less useful if:
- Your agent is stateless by design and you prefer deterministic outputs
- You're using a heavily sandboxed environment where you can't run a sidecar process
- The agent doesn't emit session data in a parseable way (though many formats work)
Why I'm Sharing This
There are plenty of memory libraries for agents, but almost all require you to build a custom integration for your specific agent. Memory Sidecar takes the opposite approach: the memory system is independent, and the agent remains blissfully unaware. That separation makes it easier to maintain, upgrade, and swap agents without rewriting persistence.
The project is open source (MIT) and the code is on GitHub: memory-sidecar-installer. If you've been fighting the same context-loss problem, give it a try. It’s saved me hours of re-explaining project context to my agents.
For more details on the architecture and setup, check the ARCHITECTURE doc. Contributions and feedback are very welcome.
Top comments (0)