DEV Community

Cover image for Under the Hood: Building a Hybrid Search Engine for AI Memory (Node.js + pgvector)
NorthernDev
NorthernDev

Posted on

Under the Hood: Building a Hybrid Search Engine for AI Memory (Node.js + pgvector)

When building RAG (Retrieval-Augmented Generation) for AI agents, most developers stop at "Cosine Similarity". They verify that Vector A is close to Vector B, and call it a day.

But human memory doesn't work like that. If I ask you "What did I eat?", the answer from 5 minutes ago is infinitely more relevant than the answer from 5 years ago, even if the semantic context is identical.

I recently built MemVault, an open-source memory server, to solve this.

Here is a technical deep dive into the architecture and the Hybrid Scoring Algorithm that powers it.

1. The Core Philosophy: Pragmatism

The architecture was designed with one goal: Reduce Infrastructure Cognitive Load.

Running a dedicated vector database (Pinecone/Milvus) alongside a primary database creates sync issues and doubles the maintenance burden.

The Solution:

  • Runtime: Node.js (Event-driven I/O is perfect for orchestrating DB/LLM calls).
  • Language: TypeScript (Strict typing is essential when handling 1536-dimensional float arrays).
  • Storage: PostgreSQL + pgvector.

By keeping vectors and metadata (session_id, user_id) in the same engine, we maintain ACID compliance and simplify the stack.

2. The Hybrid Search Algorithm

This is where the magic happens. A naive RAG implementation suffers from "Contextual Drift". To fix this, MemVault calculates a weighted score in real-time.

The formula looks roughly like this:

Score = (SemanticSimilarity * α) + (RecencyScore * β) + (Importance * γ)
Enter fullscreen mode Exit fullscreen mode

The Components:

  1. Semantic Similarity (α):
    We use pgvector to calculate the Cosine Distance. This tells us how much the topics match.

  2. Recency Decay (β):
    We apply a decay function to the timestamp. Memories "fade" over time unless they are reinforced. This ensures the agent prioritizes the current conversation context.

  3. Importance (γ):
    An explicit weight. Some facts (e.g., "User is allergic to nuts") should never decay.

By tuning these weights (default is 80% Semantic, 20% Recency), the agent behaves much more naturally.

3. Observability: The "Black Box" Problem

The biggest issue with Vector Search is that it is opaque. You cannot "read" a vector.

If an agent hallucinates, how do you debug it?

  • Was the embedding bad?
  • Was the threshold too low?

To solve this, I built a Real-time Visualizer.

Visualizer Dashboard

It projects the high-dimensional vector space into a 2D graph, allowing you to visually inspect clusters. If "Cat" and "Car" nodes are overlapping, you know your embedding model is broken.

4. Open Source & Roadmap

The project is fully open source. The next step on the roadmap (as highlighted by a recent architectural audit) is to implement BM25 (Keyword Search) to better handle unique identifiers like Product IDs, where semantic search often fails.

If you are interested in the code or want to try the algorithm:

GitHub logo jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault: The Intelligent Memory Layer for AI Agents

Build Status License NPM Version Marketplace

Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.

MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.

Start 7-Day Free Trial | Read Documentation | NPM SDK


Why MemVault?

Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.

The "Sleep Cycle" Engine (Unique Feature)

Just like the biological brain, MemVault consolidates information asynchronously.

  • Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
  • Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…

Links:

Let me know if you have questions about the pgvector implementation!

Top comments (2)

Collapse
 
franksmithiii profile image
Frank Smith III

This is absolutely mind blowing!

Collapse
 
the_nortern_dev profile image
NorthernDev

Thanks! I love to hear some feedback!