DEV Community

woshilaohei
woshilaohei

Posted on

Building a Production-Grade MCP Memory Server: Lessons from MindCore

AI agents forget everything between sessions. Here's how we built an MCP memory server that actually survives production — with circuit breakers, hybrid search, and a novel boundary algorithm.


The Problem: AI Amnesia is Real

Every developer who has used Claude Desktop, Cursor, or any AI agent has experienced the same frustration:

  1. You ask the AI to remember something important
  2. Next session, it has absolutely no idea what you are talking about
  3. You repeat yourself. Again. And again.

This is not a model problem. It is an architecture problem. LLMs have no persistent memory. The Model Context Protocol (MCP) solved the integration problem — how AI connects to external tools — but it did not solve the memory problem. Someone has to build the memory servers.

That "someone" turned out to be us.

Enter MindCore Memory MCP

MindCore Memory is an open-source MCP server that gives AI agents persistent, searchable, production-grade memory. It works with any MCP-compatible client — Claude Desktop, Cursor, Cline, custom agents.


bash
pip install mindcore-memory
mindcore-memory
That's it. Your AI agent now remembers across sessions. But the interesting story is not the API — it is the engineering decisions we made to get it production-ready.

Lesson 1: Hybrid Search is Not Optional
Most memory servers use one retrieval method: either BM25 (keyword) or FAISS (semantic). We learned the hard way that neither is sufficient alone.

BM25 only: Search for "API rate limiting" returns memories about "rate limits" but misses "throttling" and "circuit breaker patterns." Keyword matching alone cannot handle synonyms.
FAISS only: Search for "Q3 revenue" returns whatever is semantically nearest — which might be entirely wrong if the index is sparse. Pure embeddings collapse without good training data.
Our solution: hybrid search with dynamic fallback. When FAISS is available (sentence-transformers installed), scores combine BM25 + cosine similarity. When FAISS is not available (or when the circuit breaker trips), the system gracefully degrades to BM25-only. No crash. No silent failure.

python
# How MindCore hybrid search works internally
score = bm25_score * 0.6 + faiss_similarity * 0.4  # configurable weights
Lesson 2: Production Means Preparing for Failure
Here is a hard truth: every production dependency will fail at some point. In an MCP memory server, the most fragile dependency is the embedding model.

What happens when sentence-transformers crashes mid-query? In most servers, the entire memory system goes down. The AI agent cannot recall anything. That is unacceptable.

We added a 3-state circuit breaker:

CLOSED (normal) → 5 consecutive failures → OPEN (requests denied, fallback to BM25)
OPEN → 30-second timeout → HALF_OPEN (test probe)
HALF_OPEN → success → CLOSED | failure → OPEN
When the circuit breaker trips, MindCore does not fail. It falls back to BM25-only search and continues serving memory. The user might not even notice.

We also added:

Exponential backoff retry (3 attempts, jitter, max 5s delay)
SLO tracking (P95/P99 latency for all 9 operations)
Prometheus metrics (/metrics endpoint, zero external dependencies)
Lesson 3: Memory Quality is a 4-Dimensional Problem
This is where it gets interesting. We observed that simply storing and retrieving memories is not enough. Memories have quality. A memory stored in panic mode during a crisis should not carry the same weight as a memory formed through careful deliberation.

We formalized this into the 3D Boundary Balance Algorithm (BND), inspired by cognitive science research:

The Forward Formula (Growth Cycle)
Trajectory (TRJ) → Boundary (BND) → Evolution (EVO) → Cognition (COG) → [cycle repeats]
Every action draws a boundary. Every boundary triggers evolution. Evolution produces cognition. New cognition expands boundaries.

The Reverse Formula (Decay Chain)
Chaos → Unknown → Risk → Harm → Death
When we detect 2+ signals from the decay chain in a memory (e.g., "undefined behavior" + "might crash"), the BND score is automatically halved. The system actively protects its memory quality.

The 4-Dimensional Score
BND_Score = 0.28×TRJ + 0.28×EVO + 0.28×COG + 0.16×BALANCE
Memories scoring below 0.40 are rejected from the boundary version chain. They might still exist as raw experience (EXP), but they do not influence future deductions. The system self-curates.

All of this runs in <1ms with zero LLM calls. Pure keyword pattern matching + regex + statistical variance.

Lesson 4: Users Will Store the Same Thing Twice. Handle It.
In early testing, we found that users (and AI agents) frequently store the same content multiple times:

"Q3 financial report summary" stored 4 times in one session
"User prefers dark mode" stored after every UI toggle
Our solution: content deduplication with attribute merging.

When identical content is stored, MindCore does not create a duplicate. Instead, it:

Merges tags (union)
Takes the max importance and confidence
Accumulates access_count
Returns the original memory ID
Behind the scenes, we also use JSONL storage with atomic rewrites — fsync() after every write, no database dependency, no corruption on crash.

Lesson 5: Security is Not a Feature — It is a Baseline
The MCP ecosystem has a security problem: 492 MCP servers were found publicly exposed with no authentication (Astrix Security, 2025). Tool poisoning attacks have been demonstrated in production environments.

MindCore ships with:

Input sanitization on every tool call (type checking, length limits, pattern validation)
Path traversal protection (storage dir cannot be /etc or C:\Windows)
Rate limiting (token bucket, 100 req/60s)
Optional Fernet encryption at rest (MINDCORE_ENCRYPT_KEY env var)
Session isolation (memories scoped to validated session IDs)
Error sanitization (production errors stripped of internal details)
These are not "premium" features. They are table stakes for anything handling data that persists between sessions.

The Numbers
118/118 tests passing (1 skipped: requires sentence-transformers for semantic search)
9 MCP tools: store, recall, context, update_confidence, delete, stats, bnd_check, bnd_stats, deduce
<1ms BND evaluation time
0 external API dependencies — runs entirely locally
MIT licensed, Python 3.10+
What is Next
Deduction Engine v2: Automatic insight generation from memory patterns
Docker image: One-command deployment for self-hosted environments
Memory graph visualization: See how your AI's memory evolves over time
Try Itbash
pip install mindcore-memory
GitHub: woshilaohei/mindcore-memory-mcp

If this resonated with you, consider giving the repo a ⭐ on GitHub. It helps more developers discover production-grade memory for their AI agents.
bash
pip install mindcore-memory
GitHub: woshilaohei/mindcore-memory-mcp

If this resonated with you, consider giving the repo a ⭐ on GitHub. It helps more developers discover production-grade memory for their AI agents.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)