DEV Community

NorthernDev
NorthernDev

Posted on

Stop Overcomplicating RAG: Why I Built a "Memory Server" on Postgres (and Open Sourced It)

Building AI agents is fun. Building the long-term memory (RAG) infrastructure for them? Not so much.

Every time I started a new side project, I hit the same "Boilerplate Wall":

  1. Spin up a Vector DB (Pinecone/Weaviate).
  2. Write the embedding pipeline logic.
  3. Figure out chunking strategies.
  4. Debug why the agent retrieves context from 3 months ago instead of yesterday.

I realized I was spending 80% of my time on plumbing and only 20% on the actual agent.

So, I decided to abstract it all away. I built MemVault, a "Memory-as-a-Service" API wrapper around PostgreSQL + pgvector.

Here is why I chose this architecture, how the hybrid search algorithm works, and why I built a visualizer to debug the "black box".

🏗️ The Architecture: Pragmatism > Complexity

I didn't want another expensive SaaS subscription, and I didn't want to manage a complex Kubernetes cluster.

Why PostgreSQL + pgvector?

Specialized vector databases are cool, but for 99% of indie projects, PostgreSQL is enough.

  • ACID Compliance: I need to know my data is actually saved.
  • Relational Data: RAG isn't just vectors. It's metadata (userId, sessionId, source). Being able to join vector searches with standard SQL filters is a superpower.
  • Cost: I can run this on a $5 VPS or a free Railway tier.

Why Node.js & TypeScript?

The backend handles the orchestration: receiving text, chunking it, calling OpenAI (or Ollama) for embeddings, and storing it via Prisma.

Using TypeScript was non-negotiable here. When you are dealing with 1536-dimensional float arrays, you want strict typing. One wrong data type and your cosine similarity calculation breaks silently.

🧠 The Algorithm: It's not just Cosine Similarity

A common mistake in RAG is relying only on semantic similarity.

If I ask: "What is my current task?"

  • Vector Search might return a task from 6 months ago because it's semantically identical.
  • Human Memory prioritizes recency.

To fix this, I implemented a Hybrid Scoring algorithm directly in the retrieval logic:

// Simplified logic for the hybrid score
const finalScore = (vectorSimilarity * 0.8) + (recencyScore * 0.2);
Enter fullscreen mode Exit fullscreen mode

By decaying the score of older memories, the agent feels much more "present" and context-aware.

👁️ Visualizing the "Black Box"

The hardest part of building RAG is debugging. When your bot hallucinates, how do you know why?

  • Did it fetch the wrong chunk?
  • Was the embedding distance too far?

Console logging JSON objects wasn't cutting it. So I built a Real-time Visualizer Dashboard.

Visualizer Dashboard showing RAG nodes

(Yes, seeing the nodes connect in real-time is incredibly satisfying)

It helps you verify exactly which chunks are being pulled from the DB and why.

🚀 Try it yourself (Open Source)

I built this to scratch my own itch, but I’ve open-sourced it for the community. It includes a docker-compose file so you can spin up the API + Postgres Database with a single command.

If you are tired of setting up RAG pipelines from scratch, give it a spin.

GitHub logo jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault: The Intelligent Memory Layer for AI Agents

Build Status License NPM Version Marketplace

Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.

MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.

Start 7-Day Free Trial | Read Documentation | NPM SDK


Why MemVault?

Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.

The "Sleep Cycle" Engine (Unique Feature)

Just like the biological brain, MemVault consolidates information asynchronously.

  • Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
  • Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…

Links

Let me know what you think! I'm currently working on adding support for local embeddings (Ollama) to make the stack 100% offline-capable.

Top comments (0)