DEV Community

Jakob Sandström
Jakob Sandström

Posted on

Stop Overcomplicating RAG: Why I Built a "Memory Server" on Postgres (and Open Sourced It)

Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.

Every time I started a new side project, I hit the same "Boilerplate Wall":

  1. Spin up a Vector DB (Pinecone/Weaviate).
  2. Write the embedding pipeline logic.
  3. Figure out chunking strategies.
  4. Debug why the agent retrieves context from 3 months ago instead of yesterday.

I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.

So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.

Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".

🏗️ The Architecture: Pragmatism > Complexity

I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.

Why PostgreSQL + pgvector?

Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.

  • ACID Compliance: I need to know my data is actually saved.
  • Relational Data: RAG isn't just vectors. It's metadata (userId, sessionId, source). Being able to join vector searches with standard SQL filters is a superpower.
  • Cost: I can run this on a $5 VPS or a free Railway tier.

Why Node.js & TypeScript?

The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.

Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.

🧠 The Algorithm: It's not just Cosine Similarity

A common mistake in RAG is relying only on semantic similarity.

If I ask: "What is my current task?"

  • Vector Search might return a task from 6 months ago because it's semantically identical.
  • Human Memory prioritizes recency.

To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:

// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
Enter fullscreen mode Exit fullscreen mode

By decaying the score of older memories, the agent feels much more "present" and context-aware.

👁️ Visualizing the "Black Box"

The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?

  • Did it fetch the wrong chunk?
  • Was the embedding distance too far?

Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.

Visualizer Dashboard showing RAG nodes

(Yes, seeing the nodes connect in real-time is incredibly satisfying)

It helps you verify exactly which chunks are being pulled from the DB and why.

🚀 Try it yourself (Open Source)

I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.

If you are tired of setting up RAG pipelines from scratch, give it a spin.

🧠 MemVault: Long-Term Memory Server

License: MIT TypeScript Docker

Production-grade API to give your AI agents long-term memory without the boilerplate.

Stop setting up Pinecone, embedding pipelines, and chunking logic for every side project. MemVault abstracts the entire RAG pipeline into a single API endpoint that runs on your own infrastructure (PostgreSQL + pgvector).


✨ Features

  • Hybrid Search: Retrieves memories based on a weighted score of Semantic Similarity, Recency, and Importance.
  • Auto-Embedding: Handles text chunking and embedding generation (OpenAI supported, local models coming soon).
  • Self-Hostable: Runs on standard PostgreSQL. No vendor lock-in.
  • Visualizer Dashboard: Includes a frontend tool to debug retrieval and see exactly why a specific memory was recalled.
  • Prisma ORM: Type-safe database access.

👁️ Visualizer (The "Debugger" for RAG)

Debugging invisible vectors is a nightmare. MemVault includes a visualizer to verify your retrieval pipeline in real-time.

Visualizer Dashboard

Live Demo


🚀 Quick Start (Docker)

The easiest way to run MemVault is…

Links

Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.

Top comments (0)