Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.
Every time I started a new side project, I hit the same "Boilerplate Wall":
- Spin up a Vector DB (Pinecone/Weaviate).
- Write the embedding pipeline logic.
- Figure out chunking strategies.
- Debug why the agent retrieves context from 3 months ago instead of yesterday.
I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.
So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.
Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".
🏗️ The Architecture: Pragmatism > Complexity
I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.
Why PostgreSQL + pgvector?
Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.
- ACID Compliance: I need to know my data is actually saved.
-
Relational Data: RAG isn't just vectors. It's metadata (
userId,sessionId,source). Being able to join vector searches with standard SQL filters is a superpower. - Cost: I can run this on a $5 VPS or a free Railway tier.
Why Node.js & TypeScript?
The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.
Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.
🧠 The Algorithm: It's not just Cosine Similarity
A common mistake in RAG is relying only on semantic similarity.
If I ask: "What is my current task?"
- Vector Search might return a task from 6 months ago because it's semantically identical.
- Human Memory prioritizes recency.
To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:
// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
By decaying the score of older memories, the agent feels much more "present" and context-aware.
👁️ Visualizing the "Black Box"
The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?
- Did it fetch the wrong chunk?
- Was the embedding distance too far?
Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.
(Yes, seeing the nodes connect in real-time is incredibly satisfying)
It helps you verify exactly which chunks are being pulled from the DB and why.
🚀 Try it yourself (Open Source)
I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.
If you are tired of setting up RAG pipelines from scratch, give it a spin.
jakops88-hub
/
Long-Term-Memory-API
Production-grade API to give your AI agents long-term memory without the boilerplate.
MemVault
A Memory Server for AI Agents. Runs on Postgres + pgvector. Now supporting 100% Local/Offline execution via Ollama.
I got tired of setting up Pinecone/Weaviate and writing the same embedding boilerplate for every small AI agent I built.
I wanted something that:
- Just runs on PostgreSQL (which I already use).
- Handles the chunking & embedding automatically.
- Lets me visualize the retrieval process (because debugging vector similarity in JSON logs is difficult).
- Can run offline without API bills.
So I built MemVault. It is a Node.js wrapper around pgvector with a generic Hybrid Search engine.
Quick Start: Choose your setup
You can run this entirely on your own machine (Docker), or use the managed API to skip the server maintenance.
| Feature | Self-Hosted (Docker) | Managed API (RapidAPI) |
|---|---|---|
| Price | Free (Open Source) | Free Tier available |
| Embeddings | Ollama (Local) or OpenAI | OpenAI (Managed) |
| Setup Time | ~15 mins | 30 seconds |
| Data Privacy | 100% on |
Links
- Live Visualizer Demo: https://memvault-demo-g38n.vercel.app/
- NPM Package: https://www.npmjs.com/package/memvault-sdk-jakops88
Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.

Top comments (0)