When building RAG (Retrieval-Augmented Generation) for AI agents, most developers stop at "Cosine Similarity". They verify that Vector A is close to Vector B, and call it a day.
But human memory doesn't work like that. If I ask you "What did I eat?", the answer from 5 minutes ago is infinitely more relevant than the answer from 5 years ago, even if the semantic context is identical.
I recently built MemVault, an open-source memory server, to solve this.
Here is a technical deep dive into the architecture and the Hybrid Scoring Algorithm that powers it.
1. The Core Philosophy: Pragmatism
The architecture was designed with one goal: Reduce Infrastructure Cognitive Load.
Running a dedicated vector database (Pinecone/Milvus) alongside a primary database creates sync issues and doubles the maintenance burden.
The Solution:
- Runtime: Node.js (Event-driven I/O is perfect for orchestrating DB/LLM calls).
- Language: TypeScript (Strict typing is essential when handling 1536-dimensional float arrays).
-
Storage: PostgreSQL +
pgvector.
By keeping vectors and metadata (session_id, user_id) in the same engine, we maintain ACID compliance and simplify the stack.
2. The Hybrid Search Algorithm
This is where the magic happens. A naive RAG implementation suffers from "Contextual Drift". To fix this, MemVault calculates a weighted score in real-time.
The formula looks roughly like this:
Score = (SemanticSimilarity * α) + (RecencyScore * β) + (Importance * γ)
The Components:
Semantic Similarity (α):
We usepgvectorto calculate the Cosine Distance. This tells us how much the topics match.Recency Decay (β):
We apply a decay function to the timestamp. Memories "fade" over time unless they are reinforced. This ensures the agent prioritizes the current conversation context.Importance (γ):
An explicit weight. Some facts (e.g., "User is allergic to nuts") should never decay.
By tuning these weights (default is 80% Semantic, 20% Recency), the agent behaves much more naturally.
3. Observability: The "Black Box" Problem
The biggest issue with Vector Search is that it is opaque. You cannot "read" a vector.
If an agent hallucinates, how do you debug it?
- Was the embedding bad?
- Was the threshold too low?
To solve this, I built a Real-time Visualizer.
It projects the high-dimensional vector space into a 2D graph, allowing you to visually inspect clusters. If "Cat" and "Car" nodes are overlapping, you know your embedding model is broken.
4. Open Source & Roadmap
The project is fully open source. The next step on the roadmap (as highlighted by a recent architectural audit) is to implement BM25 (Keyword Search) to better handle unique identifiers like Product IDs, where semantic search often fails.
If you are interested in the code or want to try the algorithm:
jakops88-hub
/
Long-Term-Memory-API
Production-grade API to give your AI agents long-term memory without the boilerplate.
MemVault: The Intelligent Memory Layer for AI Agents
Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.
MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.
Start 7-Day Free Trial | Read Documentation | NPM SDK
Why MemVault?
Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.
The "Sleep Cycle" Engine (Unique Feature)
Just like the biological brain, MemVault consolidates information asynchronously.
- Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
- Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…
Links:
Let me know if you have questions about the pgvector implementation!

Top comments (2)
This is absolutely mind blowing!
Thanks! I love to hear some feedback!