Last week, I launched MemVault, an open-source memory layer for AI agents built on PostgreSQL and pgvector.
The response was amazing (thanks for the stars and forks!), but as more developers started testing it, a critical architectural flaw became apparent.
The Problem: Vectors are "Fuzzy"
In my initial design, I relied 100% on Cosine Similarity.
- Conceptually: It works great. "Apple" matches "Fruit".
- Practically: It fails on specifics. If an agent needs to recall "Error Code 503", vector search might retrieve "Error Code 404" because they are semantically identical (server errors), even though I needed the exact match.
I realized that for a production-grade memory system, fuzzy matching isn't enough. You need exact precision for IDs, names, and technical terms.
The Solution: Hybrid Search 2.0
Based on this feedback (and a deep architectural review), I spent the weekend refactoring the retrieval engine.
I moved from a pure vector approach to a Hybrid Search model that runs entirely inside PostgreSQL. It now calculates a weighted score based on three factors:
- Semantic (Vector): Uses
pgvectorto understand the meaning. - Exact Match (Keyword): Uses PostgreSQL's native
tsvector(BM25) to find exact keywords. - Recency (Time): Uses a decay function to prioritize fresh memories.
The Scoring Formula
The SQL query now computes a composite score on the fly:
FinalScore = (VectorScore * 0.5) + (KeywordScore * 0.3) + (RecencyScore * 0.2)
This ensures that if you search for a specific User ID, the Keyword Score spikes and overrides any "fuzzy" semantic matches.
Going 100% Offline (Ollama Support)
The other major request was removing the dependency on OpenAI.
I refactored the backend to use a Provider Pattern for embeddings. By changing a single environment variable, you can now swap out OpenAI for a local Ollama instance running nomic-embed-text.
// src/services/embeddings/index.ts
export function getEmbeddingProvider() {
switch (process.env.EMBEDDING_PROVIDER) {
case 'openai': return new OpenAIEmbeddingProvider();
case 'ollama': return new OllamaProvider(); // 100% Local & Free
}
}
This means you can now deploy the entire stack—Database, API, and Inference—on your own hardware, completely air-gapped.
Visualizing the Upgrade
To verify that the new scoring logic actually works, I upgraded the Visualizer Dashboard. It now renders a real-time graph where you can see how the Hybrid Score connects your query to specific memory nodes.
(It also looks pretty cool in dark mode).
Try the Update
The code is open source and available on GitHub. The repository now includes a docker-compose file that spins up the API, Postgres (with vector/text extensions), and the frontend in one go.
jakops88-hub
/
Long-Term-Memory-API
Production-grade API to give your AI agents long-term memory without the boilerplate.
MemVault
A Memory Server for AI Agents. Runs on Postgres + pgvector. Now supporting 100% Local/Offline execution via Ollama.
I got tired of setting up Pinecone/Weaviate and writing the same embedding boilerplate for every small AI agent I built.
I wanted something that:
- Just runs on PostgreSQL (which I already use).
- Handles the chunking & embedding automatically.
- Lets me visualize the retrieval process (because debugging vector similarity in JSON logs is difficult).
- Can run offline without API bills.
So I built MemVault. It is a Node.js wrapper around pgvector with a generic Hybrid Search engine.
Quick Start: Choose your setup
You can run this entirely on your own machine (Docker), or use the managed API to skip the server maintenance.
| Feature | Self-Hosted (Docker) | Managed API (RapidAPI) |
|---|---|---|
| Price | Free (Open Source) | Free Tier available |
| Embeddings | Ollama (Local) or OpenAI | OpenAI (Managed) |
| Setup Time | ~15 mins | 30 seconds |
| Data Privacy | 100% on |
Links
- Live Demo: https://memvault-demo-g38n.vercel.app/
-
NPM Package:
memvault-sdk-jakops88
I'd love to hear your thoughts on the BM25 implementation. Combining ts_rank normalization with Cosine Similarity was a fun challenge!

Top comments (0)