Production Memory for AI Agents: Give Your Agent Persistent Context Without Patching Its Internals

#ai #opensource #memory

Every time I restart my AI coding agent, it feels like onboarding a new junior developer. It doesn't remember the project architecture, the ongoing bugs, or the decisions we made last week. I tried prompt engineering, fine-tuning, and custom plugins — none gave me a persistent, structured memory that crosses sessions.

Most memory solutions require deep integration with the agent's internals. You patch the agent itself, which ties you to a specific platform and breaks on updates. Or you rely on naive chat history dumping, which doesn't scale. I wanted something that works with any agent — Hermes, Claude Code, Cursor, Codex — without modifying the agent. A separate process that archives sessions, builds knowledge, and injects relevant context when needed. That's how Memory Sidecar was born.

Memory Sidecar runs alongside your agent as a sidecar: separate process, shared data directory. It doesn't patch anything. The architecture uses three layers:

Hot Layer – recent context (∼5KB cap), fast access for what happened in the last few turns.
Warm Layer – a PostgreSQL-based hindsight service that analyzes sessions and extracts structured memories.
Cold Layer – a knowledge graph (gbrain) with FTS5 search, storing long-term knowledge about people, projects, and recurring problems.

When the agent starts a new session, the sidecar injects tiered context into the system prompt. It pulls from all three layers, prioritizing recency and relevance. The agent gets a compressed but rich summary of what's worth remembering.

The latest release (v3.1.1) adds several production‑ready features:

memory_watermark.py — automatically detects when memory usage crosses a threshold and archives old data, so you don't blow past context limits.
memory_snapshot_backup.py — periodic, configurable backups of the memory state.
hindsight-service.py — simplified standalone daemon for the warm layer, no complex orchestration needed.
session_to_gbrain.py — now reads tokens from environment variables (no more hardcoded configs).
HERMES_ONBOARDING.md — a complete integration guide for connecting any agent, with full tool listings and examples.

When to use it: This setup shines for agents that work on long‑running projects, need to remember user preferences, or handle complex debugging across sessions. If you're tired of repeating yourself every time you restart your agent, this gives you a real, persistent memory layer that works out of the box.

When not to use it: It's not for real‑time memory within a single session — that's handled by the agent's built‑in context window. Nor is it a replacement for fine‑tuning; it's a caching and retrieval layer, not a model update.

I've been running this for a few weeks, and the difference is night and day. My agent now remembers the project's exception patterns, the API keys we use, and the preferences I set three sessions ago. No more repeating context. No more mental overhead of re-explaining everything.

The project is open source (MIT) and you can find it here: Memory Sidecar on GitHub. If you're building agents that need to work across sessions, give it a try — I think you'll find it just works.

DEV Community

Production Memory for AI Agents: Give Your Agent Persistent Context Without Patching Its Internals

Top comments (0)