Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.
Every time I started a new side project, I hit the same "Boilerplate Wall":
- Spin up a Vector DB (Pinecone/Weaviate).
- Write the embedding pipeline logic.
- Figure out chunking strategies.
- Debug why the agent retrieves context from 3 months ago instead of yesterday.
I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.
So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.
Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".
🏗️ The Architecture: Pragmatism > Complexity
I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.
Why PostgreSQL + pgvector?
Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.
- ACID Compliance: I need to know my data is actually saved.
-
Relational Data: RAG isn't just vectors. It's metadata (
userId,sessionId,source). Being able to join vector searches with standard SQL filters is a superpower. - Cost: I can run this on a $5 VPS or a free Railway tier.
Why Node.js & TypeScript?
The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.
Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.
🧠 The Algorithm: It's not just Cosine Similarity
A common mistake in RAG is relying only on semantic similarity.
If I ask: "What is my current task?"
- Vector Search might return a task from 6 months ago because it's semantically identical.
- Human Memory prioritizes recency.
To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:
// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
By decaying the score of older memories, the agent feels much more "present" and context-aware.
👁️ Visualizing the "Black Box"
The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?
- Did it fetch the wrong chunk?
- Was the embedding distance too far?
Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.
(Yes, seeing the nodes connect in real-time is incredibly satisfying)
It helps you verify exactly which chunks are being pulled from the DB and why.
🚀 Try it yourself (Open Source)
I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.
If you are tired of setting up RAG pipelines from scratch, give it a spin.
jakops88-hub
/
Long-Term-Memory-API
Production-grade API to give your AI agents long-term memory without the boilerplate.
MemVault: The Intelligent Memory Layer for AI Agents
Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.
MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.
Start 7-Day Free Trial | Read Documentation | NPM SDK
Why MemVault?
Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.
The "Sleep Cycle" Engine (Unique Feature)
Just like the biological brain, MemVault consolidates information asynchronously.
- Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
- Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…
Links
- Live Visualizer Demo: https://memvault-demo-g38n.vercel.app/
- NPM Package: https://www.npmjs.com/package/memvault-sdk-jakops88
Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.

Top comments (0)