DEV Community: jinho von choi

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP

jinho von choi — Fri, 27 Mar 2026 17:57:29 +0000

GitHub : https://github.com/JinHo-von-Choi/memento-mcp/blob/main/README.en.md

I posted v1 about a month ago. The architecture has been significantly reworked since then.

The premise: We've been optimizing the wrong variable. The next leap isn't a better prompt. It's an AI that actually knows you, your project, and your mistakes.

Your AI knows every Redis command ever documented. It doesn't know that your Redis threw NOAUTH last Tuesday because someone forgot the env var. Knowledge without experience. Close the session, and it all evaporates. Goldfish remember for months. Our AIs remember for zero seconds.

RAG builds a library. Memento builds experience.

RAG dumps docs into a vector store and retrieves chunks. That's a library. A library treats every page as equally relevant. It doesn't know which chapter saved your production server at 2 AM.

Memento works differently.

Say someone suddenly asks me out of nowhere: "Hey, do you remember Mijeong?"

I'd draw a blank. "Who?" Then they say: "Your desk partner in first grade."

That single hint is enough. A vague face surfaces. "Oh... right."

Then more comes flooding back: drawing a line down the middle of the desk and pinching each other if someone crossed it, lending an eraser and never getting it back.

That's what Memento does. Memory as atomic fragments (1–3 sentences each), reconstructed through association, not retrieved as document dumps.

How it works:

Three-layer cascade search. L1 (Redis keyword index, microseconds) → L2 (Postgres metadata, milliseconds) → L3 (pgvector semantic search, deepest). Fast layers answer first; slow layers are skipped. Redis and OpenAI are both optional. Postgres alone is a fully functional baseline.

Memories have temperature. Hot → warm → cold → expired. But recalled once, and they snap back to hot. Just like human long-term memory.

Some things never decay. Preferences (who you are) and error patterns (what can always return) are permanent.

Experience compounds. reflect() at session end distills decisions/errors/procedures into fragments. context() at session start loads them. Over time, the AI genuinely gets better at working with you specifically.

Appropriate forgetting. Periodic consolidation decays unused memories, merges duplicates, and detects contradictions. The store gets denser, not just bigger.

What's new since v1: Cascade search (L1/L2/L3), fragment linking with causal graph exploration, TTL tier system, automatic duplicate merging, LLM-based contradiction detection, Streamable HTTP (MCP 2025-11-25), Claude Code hook support, RBAC (read/write/admin), knowledge graph visualization, fragment import/export, sentiment-aware decay, closed learning loop, temperature-weighted context, admin module split with cookie auth, DB migration runner.

Stack: Node.js 20+ / PostgreSQL 14+ (pgvector) / Redis 6+ (optional) / OpenAI Embeddings (optional) / Gemini Flash (optional)

Feedback, issues, and PRs welcome.

This MCP Makes Your AI Smarter: Parism — A Terminal Output Parser for AI Agents

jinho von choi — Sat, 07 Mar 2026 14:13:33 +0000

Have you ever watched your AI agent fumble a simple directory listing — retrying three times for no obvious reason — and wondered what went wrong?

The answer, more often than not, is misreading.

The Problem: AI Can't Really Read Terminal Output

Terminal output is designed for human eyes. When you run ls -la, you instantly understand which column is the filename, which is the size, and which is the timestamp. To an AI, it's just a blob of characters with no clear structure.

Here's what that means in practice:

Plain text misread rate: ~4% on average
With spaces or special characters in filenames: up to 30%
Overall task reliability: ~85%

That 15% failure rate sounds small — until one wrong read cascades into minutes (or hours) of the agent spinning its wheels, misinterpreting data, and making things worse.

It gets messier when you factor in OS differences. stat on macOS outputs something completely different from Linux. Windows is a different universe altogether. AI models frequently get confused trying to parse these inconsistencies on the fly.

The Idea Behind Parism

Parism is an MCP (Model Context Protocol) server that acts as a translator between your terminal and your AI agent.

Instead of letting the AI parse raw text output directly:

Without Parism:

AI → Terminal → "figure it out yourself" → ~85% accuracy

With Parism:

AI → Parism → Terminal → clean JSON → AI → 100% accuracy

The AI no longer needs to guess where the filename ends and the size begins. It just reads a key-value pair from structured JSON.

{
  "files": [
    {
      "name": "my file (final).zip",
      "size_bytes": 2147483648,
      "modified": "2025-03-06T22:14:00"
    }
  ]
}

What Does This Actually Buy You?

Token savings on repeated data use: If you're doing a one-off lookup, Parism actually increases token usage (the JSON overhead). But the moment you reference that data more than once in a long task, it reverses — the AI no longer needs to re-explain the format to itself, and the 67% reduction in "explanation tokens" compounds.

Speed: When the AI doesn't need to reason about data format, it skips an entire inference step. Tasks complete faster.

Reliability: The stat command scenario was telling — without Parism, accuracy on macOS was literally 0% because the output format is incompatible with what models trained on Linux examples expect.

A Mental Model

Think of it like turning down the music in your car when you're trying to read a street sign. The music and the sign are unrelated — but reducing cognitive load in one area frees up attention for another.

Or, as Sun Tzu put it: it's more valuable to make the enemy go hungry once than to feed your own troops twenty times. One mistake undoes twenty successes. Parism is about eliminating that one mistake.

How to Set It Up

Since it's published on npm, you just add it to your MCP config:

{
  "mcpServers": {
    "parism": {
      "command": "npx",
      "args": ["-y", "@nerdvana/parism"]    }
  }
}

After that, the AI will automatically use it when reading terminal output.

GitHub: https://github.com/JinHo-von-Choi/parism
npm: https://www.npmjs.com/package/@nerdvana/parism## When Is It Worth It?

Use it when you're running complex, multi-step agentic tasks that read filesystem data multiple times. For simple one-shot queries, the JSON overhead may not pay off. But for anything involving loops, retries, or cross-platform compatibility — it's a meaningful quality-of-life upgrade for your AI workflow.

I built a MyBatis-style SQL mapper for .NET because EF Core was eating all our memory

jinho von choi — Sat, 07 Mar 2026 03:11:50 +0000

When I inherited a .NET-based stats service, the codebase was EF Core all the way down. Query performance was the first problem — and that I solved, eventually cutting response times by 2x to 3600x depending on the query. But memory was different. EF Core's change tracking, materialization overhead, and object graph behavior imposed a ceiling I couldn't optimize past.

I'd spent most of my career on Java and Spring. MyBatis was my default tool for anything SQL-heavy. I looked at Dapper — solid library, genuinely good — but I wanted SQL and code to live in separate files, not inline strings. So I built a test mapper, moved the most memory-intensive query onto it, and measured: 2–3x faster execution, 82% reduction in memory consumption.

That test became NuVatis.
What it is: a SQL mapper for .NET that generates all mapper implementation code at build time via Roslyn Source Generators. The result is zero runtime reflection, full Native AOT compatibility (.NET 8+), and an XML-to-interface mapping model that Java developers will recognize immediately.

What it isn't: a full ORM. There's no change tracking, no migration system, no LINQ query builder. You write SQL. NuVatis maps it.
Benchmark results: https://jinho-von-choi.github.io/nuvatis-sample
GitHub: https://github.com/JinHo-von-Choi/nuvatis