When building RAG (Retrieval-Augmented Generation) for AI agents, most developers stop at "Cosine Similarity". They verify that Vector A is close to Vector B, and call it a day.
But human memory doesn't work like that. If I ask you "What did I eat?", the answer from 5 minutes ago is infinitely more relevant than the answer from 5 years ago, even if the semantic context is identical.
I recently built MemVault, an open-source memory server, to solve this.
Here is a technical deep dive into the architecture and the Hybrid Scoring Algorithm that powers it.
1. The Core Philosophy: Pragmatism
The architecture was designed with one goal: Reduce Infrastructure Cognitive Load.
Running a dedicated vector database (Pinecone/Milvus) alongside a primary database creates sync issues and doubles the maintenance burden.
The Solution:
- Runtime: Node.js (Event-driven I/O is perfect for orchestrating DB/LLM calls).
- Language: TypeScript (Strict typing is essential when handling 1536-dimensional float arrays).
-
Storage: PostgreSQL +
pgvector.
By keeping vectors and metadata (session_id, user_id) in the same engine, we maintain ACID compliance and simplify the stack.
2. The Hybrid Search Algorithm
This is where the magic happens. A naive RAG implementation suffers from "Contextual Drift". To fix this, MemVault calculates a weighted score in real-time.
The formula looks roughly like this:
Score = (SemanticSimilarity * α) + (RecencyScore * β) + (Importance * γ)
The Components:
Semantic Similarity (α):
We usepgvectorto calculate the Cosine Distance. This tells us how much the topics match.Recency Decay (β):
We apply a decay function to the timestamp. Memories "fade" over time unless they are reinforced. This ensures the agent prioritizes the current conversation context.Importance (γ):
An explicit weight. Some facts (e.g., "User is allergic to nuts") should never decay.
By tuning these weights (default is 80% Semantic, 20% Recency), the agent behaves much more naturally.
3. Observability: The "Black Box" Problem
The biggest issue with Vector Search is that it is opaque. You cannot "read" a vector.
If an agent hallucinates, how do you debug it?
- Was the embedding bad?
- Was the threshold too low?
To solve this, I built a Real-time Visualizer.
It projects the high-dimensional vector space into a 2D graph, allowing you to visually inspect clusters. If "Cat" and "Car" nodes are overlapping, you know your embedding model is broken.
4. Open Source & Roadmap
The project is fully open source. The next step on the roadmap (as highlighted by a recent architectural audit) is to implement BM25 (Keyword Search) to better handle unique identifiers like Product IDs, where semantic search often fails.
If you are interested in the code or want to try the algorithm:
jakops88-hub
/
Long-Term-Memory-API
Production-grade API to give your AI agents long-term memory without the boilerplate.
MemVault
A Memory Server for AI Agents. Runs on Postgres + pgvector.
I got tired of setting up Pinecone or Weaviate and writing the same embedding boilerplate for every small AI agent I built.
I wanted something that:
- Just runs on PostgreSQL (which I already use).
- Handles the chunking & embedding automatically.
- Lets me visualize the retrieval process (because debugging vector similarity in JSON logs is difficult).
So I built MemVault. It is a Node.js wrapper around pgvector with a hybrid search algorithm (Vector Similarity + Recency Decay).
Quick Start: Choose your setup
You can run this entirely on your own machine (Docker), or use the managed API to skip the server maintenance.
| Feature | Self-Hosted (Docker) | Managed API (RapidAPI) |
|---|---|---|
| Price | Free (Open Source) | Free Tier available |
| Setup Time | ~15 mins | 30 seconds |
| Data Privacy | 100% on your server | Hosted by us |
| Maintenance | You manage updates/uptime | We handle everything |
| Link | Scroll |
Links:
Let me know if you have questions about the pgvector implementation!

Top comments (0)