DEV Community

Leandro Pérez González
Leandro Pérez González

Posted on

I built a persistent memory MCP with Hebbian learning and GraphRAG

The Problem

AI coding assistants forget everything between sessions. Every conversation starts from zero. You explain your architecture, your patterns, your preferences — and next time, it's gone.
I built Cuba-Memorys to fix this.

What is it?

An MCP server that gives AI agents persistent long-term memory using a knowledge graph backed by PostgreSQL. 12 tools with Cuban-themed names.

Key Features

🧠 Knowledge Graph

Entities, observations, and typed relations that persist across sessions. Your AI remembers projects, decisions, patterns, and connections between them.

⚡ Hebbian Learning

"Neurons that fire together wire together" — memories strengthen with use (Oja's rule) and fade adaptively using FSRS spaced repetition (the algorithm behind Anki).

🔍 4-Signal RRF Fusion Search

Not just keyword matching. Cuba-Memorys fuses 4 search signals:

  • TF-IDF similarity
  • PostgreSQL full-text search
  • Trigram matching
  • Optional pgvector HNSW embeddings Results are ranked using Reciprocal Rank Fusion for maximum recall. ### 🕸️ GraphRAG Enrichment Top search results are automatically enriched with degree-1 graph neighbors, giving your AI topological context — not just text matches. ### 🛡️ Anti-Hallucination Grounding Verify claims against stored knowledge with graduated confidence scoring: verifiedpartialweakunknown. Your AI can fact-check itself before answering. ### 😴 REM Sleep Consolidation After 15 minutes of inactivity, the system automatically runs maintenance: adaptive decay, deduplication, and memory consolidation. Like biological sleep for your AI's memory. ### 📊 Graph Analytics
  • PageRank — Find the most important entities
  • Louvain communities — Discover knowledge clusters
  • Betweenness centrality — Identify bridge concepts ## The 12 Tools | Tool | Purpose | |------|---------| | cuba_alma | CRUD knowledge graph entities | | cuba_cronica | Attach observations to entities | | cuba_puente | Create/traverse relations | | cuba_faro | Search with anti-hallucination | | cuba_alarma | Report and track errors | | cuba_expediente | Search past errors (anti-repetition) | | cuba_remedio | Mark errors as resolved | | cuba_eco | RLHF feedback loop | | cuba_decreto | Record architecture decisions | | cuba_jornada | Session management | | cuba_vigia | Graph analytics | | cuba_zafra | Memory maintenance | ## Quick Start

json
{
  "mcpServers": {
    "cuba-memorys": {
      "command": "python",
      "args": ["-m", "cuba_memorys"],
      "env": {
        "DATABASE_URL": "postgresql://user:pass@localhost:5432/brain"
      }
    }
  }
}
¨¨


The database auto-provisions via Docker if needed. Zero manual setup.

Why "Cuba"?
The tools are named after Cuban Spanish words — alma (soul), crónica (chronicle), faro (lighthouse), remedio (remedy). It started as a personal project and the naming stuck.

Links
GitHub: github.com/lENADRO1910/cuba-memorys
PyPI: pip install cuba-memorys
License: CC BY-NC 4.0
If you find it useful, a ⭐ on GitHub would mean a lot. Open to feedback and contributions!


Enter fullscreen mode Exit fullscreen mode

Top comments (2)

Collapse
 
klement_gunndu profile image
klement Gunndu

The 4-signal RRF fusion search is interesting — how does it perform when the knowledge graph gets past ~10K entities? Curious if the trigram matching or pgvector path dominates at that scale.

Collapse
 
lenadro1910 profile image
Leandro Pérez González

Great question — the scaling behavior depends on which signals are active.

At the PostgreSQL level, the 4-signal fusion uses two index-backed paths per query:

GIN tsvector — ts_rank_cd with plainto_tsquery('simple', ...) for full-text search. GIN indexes scale sub-linearly — PostgreSQL's GIN uses compressed posting lists, so even at 100K+ observations the index scan stays in the low milliseconds. This signal carries 35% weight.

GIN pg_trgm —

similarity(name, $1) > 0.3
as the filter predicate. Trigram GIN indexes are O(n) on the trigram set size, not the row count. At ~10K entities with typical name lengths (5-50 chars), the trigram signature is ~15 trigrams per entry, so the index stays compact. This signal carries 30% weight.

Importance + freshness — Pure scalar columns with no index scan needed (computed inline). These contribute 25% + 10% respectively.

The pgvector path (embedding <=> $2::vector) is only activated in verify mode via a separate query (SEARCH_VECTOR_SQL). It uses exact L2 distance scan (no HNSW/IVFFlat index currently) — this would be the bottleneck at >50K embeddings. For the standard hybrid search mode, pgvector is not in the hot path at all.

So to directly answer your question: at ~10K entities, trigram dominates latency (~2-5ms for the GIN scan) since tsvector is faster but trigram has to compute edit-distance similarities. The RRF fusion itself (Cormack 2009, k=60) is pure Python dict merging — O(n) where n = result count (capped at LIMIT, typically 10-50), so negligible.

At >50K scale with embeddings enabled, I'd add an HNSW index (CREATE INDEX ON brain_observations USING hnsw (embedding vector_cosine_ops)) to keep the vector path under 10ms. That's on the roadmap but hasn't been necessary yet — the current deployment handles ~2K entities with sub-20ms total query time.

Appreciate the interest — scaling the vector path is definitely the next optimization target. 🤘