Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.
Every time I started a new side project, I hit the same "Boilerplate Wall":
- Spin up a Vector DB (Pinecone/Weaviate).
- Write the embedding pipeline logic.
- Figure out chunking strategies.
- Debug why the agent retrieves context from 3 months ago instead of yesterday.
I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.
So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.
Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".
🏗️ The Architecture: Pragmatism > Complexity
I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.
Why PostgreSQL + pgvector?
Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.
- ACID Compliance: I need to know my data is actually saved.
-
Relational Data: RAG isn't just vectors. It's metadata (
userId,sessionId,source). Being able to join vector searches with standard SQL filters is a superpower. - Cost: I can run this on a $5 VPS or a free Railway tier.
Why Node.js & TypeScript?
The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.
Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.
🧠 The Algorithm: It's not just Cosine Similarity
A common mistake in RAG is relying only on semantic similarity.
If I ask: "What is my current task?"
- Vector Search might return a task from 6 months ago because it's semantically identical.
- Human Memory prioritizes recency.
To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:
// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
By decaying the score of older memories, the agent feels much more "present" and context-aware.
👁️ Visualizing the "Black Box"
The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?
- Did it fetch the wrong chunk?
- Was the embedding distance too far?
Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.
(Yes, seeing the nodes connect in real-time is incredibly satisfying)
It helps you verify exactly which chunks are being pulled from the DB and why.
🚀 Try it yourself (Open Source)
I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.
If you are tired of setting up RAG pipelines from scratch, give it a spin.
🧠 MemVault: Long-Term Memory Server
Production-grade API to give your AI agents long-term memory without the boilerplate.
Stop setting up Pinecone, embedding pipelines, and chunking logic for every side project. MemVault abstracts the entire RAG pipeline into a single API endpoint that runs on your own infrastructure (PostgreSQL + pgvector).
✨ Features
- Hybrid Search: Retrieves memories based on a weighted score of Semantic Similarity, Recency, and Importance.
- Auto-Embedding: Handles text chunking and embedding generation (OpenAI supported, local models coming soon).
- Self-Hostable: Runs on standard PostgreSQL. No vendor lock-in.
- Visualizer Dashboard: Includes a frontend tool to debug retrieval and see exactly why a specific memory was recalled.
- Prisma ORM: Type-safe database access.
👁️ Visualizer (The "Debugger" for RAG)
Debugging invisible vectors is a nightmare. MemVault includes a visualizer to verify your retrieval pipeline in real-time.
🚀 Quick Start (Docker)
The easiest way to run MemVault is…
Links
- Live Visualizer Demo: https://memvault-demo-g38n.vercel.app/
- NPM Package: https://www.npmjs.com/package/memvault-sdk-jakops88
Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.


Top comments (0)