How I Gave My AI Agent Persistent Memory Without Modifying Its Code

#ai #opensource #memory

If you've ever worked with AI agents in production, you know the frustration: every new session starts from scratch. The agent has no memory of previous conversations, no context about ongoing projects, and you have to repeat yourself constantly. It's like Groundhog Day for your AI.

I ran into this with a code assistant I was using for a multi-week refactoring project. It was great for one-off questions, but it couldn't remember what we discussed yesterday. I'd ask it about the architecture decisions we made last week, and it would stare at me blankly. I needed something that could carry context across sessions without forcing me to patch the agent's internals.

I looked at the usual suspects: vector databases for RAG, ad-hoc session dumping, even fine-tuning. Each had a cost. RAG setups are powerful but often require custom tooling and tight integration. Session logs without structure are just noise. Fine-tuning is expensive and slow to iterate on. What I wanted was a self-contained system that worked with any agent, required no code changes to the agent, and actually understood what to keep and what to forget.

That's when I found Memory Sidecar. It's an open-source project designed to run alongside any AI agent—Hermes, Claude Code, Cursor, Codex, or your own custom setup—as a separate process. It watches your agent's output, archives important conversations, builds a long-term knowledge base, and injects relevant context back before each new session. No patches, no invasive changes.

How it works

The architecture is simple on the surface but layered underneath. Agents write sessions to state.db and session files. The sidecar reads these, processes new content, and feeds through a three-tier retrieval system:

Hot layer: Recent context with a small footprint (5 KB cap). This is the stuff the agent just talked about.
Warm layer: Hindsight PostgreSQL database that stores summarised sessions and recent history.
Cold layer: A knowledge graph (gbrain) combined with FTS5 search. This handles long-term knowledge—people, projects, recurring problems—and retrieves via semantic search or graph traversal.

These layers are queried during context injection. The system doesn't dump everything into the prompt; it selects what's most relevant from each tier. Recent context goes straight in, older knowledge is surfaced only when the agent's current task relates to it.

What it looks like in practice

I'm running it with a local LLM agent for code review. The sidecar monitors session files, builds dossiers on topics like "authentication refactor" and "database indexing", and tracks discussions across multiple conversations. When I start a new session, it injects a concise summary of last week's work—no need to rehash decisions.

The project also includes practical tools: memory_watermark.py for automatic archival when memory grows too large, and memory_snapshot_backup.py for periodic snapshots. For multi-agent setups, session_to_gbrain.py syncs sessions into the knowledge graph, and hindsight-service.py runs the warm layer as an independent daemon. Full details are in the HERMES_ONBOARDING.md guide, which walks through connecting other agents.

Where it fits and where it doesn't

Memory Sidecar shines when your agent supports system prompt injection or tool-based context. Most modern coding agents do. If yours doesn't, you'll need a small bridge to pipe the context in. The project's MCP bridge (hindsight_mcp_bridge.py) is a good starting point.

It's not a vector store replacement for large-scale corpus search. It's purpose-built for session-level memory—keeping what an agent experienced across conversations. That's a narrower scope, but one that many of us working with interactive agents actually need.

Getting started

The project is at v3.1.1 (MIT license) and the repo includes clear docs for setup and architecture. Clone it, point the sidecar to your agent's data directory, and run the service. There's a quickstart in the README and a more detailed ARCHITECTURE.md if you want to understand the internals.

Check it out on GitHub: https://github.com/mage0535/hermes-memory-installer

If you've been fighting with context retention, give it a try and see if it solves the same pain it fixed for me.