DEV Community

Jakob Sandström
Jakob Sandström

Posted on

Stop Overcomplicating RAG: Why I Built a "Memory Server" on Postgres (and Open Sourced It)

Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.

Every time I started a new side project, I hit the same "Boilerplate Wall":

  1. Spin up a Vector DB (Pinecone/Weaviate).
  2. Write the embedding pipeline logic.
  3. Figure out chunking strategies.
  4. Debug why the agent retrieves context from 3 months ago instead of yesterday.

I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.

So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.

Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".

🏗️ The Architecture: Pragmatism > Complexity

I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.

Why PostgreSQL + pgvector?

Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.

  • ACID Compliance: I need to know my data is actually saved.
  • Relational Data: RAG isn't just vectors. It's metadata (userId, sessionId, source). Being able to join vector searches with standard SQL filters is a superpower.
  • Cost: I can run this on a $5 VPS or a free Railway tier.

Why Node.js & TypeScript?

The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.

Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.

🧠 The Algorithm: It's not just Cosine Similarity

A common mistake in RAG is relying only on semantic similarity.

If I ask: "What is my current task?"

  • Vector Search might return a task from 6 months ago because it's semantically identical.
  • Human Memory prioritizes recency.

To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:

// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
Enter fullscreen mode Exit fullscreen mode

By decaying the score of older memories, the agent feels much more "present" and context-aware.

👁️ Visualizing the "Black Box"

The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?

  • Did it fetch the wrong chunk?
  • Was the embedding distance too far?

Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.

Visualizer Dashboard showing RAG nodes

(Yes, seeing the nodes connect in real-time is incredibly satisfying)

It helps you verify exactly which chunks are being pulled from the DB and why.

🚀 Try it yourself (Open Source)

I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.

If you are tired of setting up RAG pipelines from scratch, give it a spin.

GitHub logo jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault

A Memory Server for AI Agents. Runs on Postgres + pgvector. Now supporting 100% Local/Offline execution via Ollama.

NPM Version License: MIT

I got tired of setting up Pinecone/Weaviate and writing the same embedding boilerplate for every small AI agent I built.

I wanted something that:

  1. Just runs on PostgreSQL (which I already use).
  2. Handles the chunking & embedding automatically.
  3. Lets me visualize the retrieval process (because debugging vector similarity in JSON logs is difficult).
  4. Can run offline without API bills.

So I built MemVault. It is a Node.js wrapper around pgvector with a generic Hybrid Search engine.


Quick Start: Choose your setup

You can run this entirely on your own machine (Docker), or use the managed API to skip the server maintenance.

Feature Self-Hosted (Docker) Managed API (RapidAPI)
Price Free (Open Source) Free Tier available
Embeddings Ollama (Local) or OpenAI OpenAI (Managed)
Setup Time ~15 mins 30 seconds
Data Privacy 100% on

Links

Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.

Top comments (0)