A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP

#ai #rag #llm #mcp

GitHub : https://github.com/JinHo-von-Choi/memento-mcp/blob/main/README.en.md

I posted v1 about a month ago. The architecture has been significantly reworked since then.

The premise: We've been optimizing the wrong variable. The next leap isn't a better prompt. It's an AI that actually knows you, your project, and your mistakes.

Your AI knows every Redis command ever documented. It doesn't know that your Redis threw NOAUTH last Tuesday because someone forgot the env var. Knowledge without experience. Close the session, and it all evaporates. Goldfish remember for months. Our AIs remember for zero seconds.

RAG builds a library. Memento builds experience.

RAG dumps docs into a vector store and retrieves chunks. That's a library. A library treats every page as equally relevant. It doesn't know which chapter saved your production server at 2 AM.

Memento works differently.

Say someone suddenly asks me out of nowhere: "Hey, do you remember Mijeong?"

I'd draw a blank. "Who?" Then they say: "Your desk partner in first grade."

That single hint is enough. A vague face surfaces. "Oh... right."

Then more comes flooding back: drawing a line down the middle of the desk and pinching each other if someone crossed it, lending an eraser and never getting it back.

That's what Memento does. Memory as atomic fragments (1–3 sentences each), reconstructed through association, not retrieved as document dumps.

How it works:

Three-layer cascade search. L1 (Redis keyword index, microseconds) → L2 (Postgres metadata, milliseconds) → L3 (pgvector semantic search, deepest). Fast layers answer first; slow layers are skipped. Redis and OpenAI are both optional. Postgres alone is a fully functional baseline.

Memories have temperature. Hot → warm → cold → expired. But recalled once, and they snap back to hot. Just like human long-term memory.

Some things never decay. Preferences (who you are) and error patterns (what can always return) are permanent.

Experience compounds. reflect() at session end distills decisions/errors/procedures into fragments. context() at session start loads them. Over time, the AI genuinely gets better at working with you specifically.

Appropriate forgetting. Periodic consolidation decays unused memories, merges duplicates, and detects contradictions. The store gets denser, not just bigger.

What's new since v1: Cascade search (L1/L2/L3), fragment linking with causal graph exploration, TTL tier system, automatic duplicate merging, LLM-based contradiction detection, Streamable HTTP (MCP 2025-11-25), Claude Code hook support, RBAC (read/write/admin), knowledge graph visualization, fragment import/export, sentiment-aware decay, closed learning loop, temperature-weighted context, admin module split with cookie auth, DB migration runner.

Stack: Node.js 20+ / PostgreSQL 14+ (pgvector) / Redis 6+ (optional) / OpenAI Embeddings (optional) / Gemini Flash (optional)

Feedback, issues, and PRs welcome.