DEV Community

Cover image for Vector Search is not enough: Why I added BM25 (Hybrid Search) to my AI Memory Server
Jakob Sandström
Jakob Sandström

Posted on

Vector Search is not enough: Why I added BM25 (Hybrid Search) to my AI Memory Server

Last week, I launched MemVault, an open-source memory layer for AI agents built on PostgreSQL and pgvector.

The response was amazing (thanks for the stars and forks!), but as more developers started testing it, a critical architectural flaw became apparent.

The Problem: Vectors are "Fuzzy"

In my initial design, I relied 100% on Cosine Similarity.

  • Conceptually: It works great. "Apple" matches "Fruit".
  • Practically: It fails on specifics. If an agent needs to recall "Error Code 503", vector search might retrieve "Error Code 404" because they are semantically identical (server errors), even though I needed the exact match.

I realized that for a production-grade memory system, fuzzy matching isn't enough. You need exact precision for IDs, names, and technical terms.

The Solution: Hybrid Search 2.0

Based on this feedback (and a deep architectural review), I spent the weekend refactoring the retrieval engine.

I moved from a pure vector approach to a Hybrid Search model that runs entirely inside PostgreSQL. It now calculates a weighted score based on three factors:

  1. Semantic (Vector): Uses pgvector to understand the meaning.
  2. Exact Match (Keyword): Uses PostgreSQL's native tsvector (BM25) to find exact keywords.
  3. Recency (Time): Uses a decay function to prioritize fresh memories.

The Scoring Formula

The SQL query now computes a composite score on the fly:

FinalScore = (VectorScore * 0.5) + (KeywordScore * 0.3) + (RecencyScore * 0.2)
Enter fullscreen mode Exit fullscreen mode

This ensures that if you search for a specific User ID, the Keyword Score spikes and overrides any "fuzzy" semantic matches.

Going 100% Offline (Ollama Support)

The other major request was removing the dependency on OpenAI.

I refactored the backend to use a Provider Pattern for embeddings. By changing a single environment variable, you can now swap out OpenAI for a local Ollama instance running nomic-embed-text.

// src/services/embeddings/index.ts
export function getEmbeddingProvider() {
  switch (process.env.EMBEDDING_PROVIDER) {
    case 'openai': return new OpenAIEmbeddingProvider();
    case 'ollama': return new OllamaProvider(); // 100% Local & Free
  }
}
Enter fullscreen mode Exit fullscreen mode

This means you can now deploy the entire stack—Database, API, and Inference—on your own hardware, completely air-gapped.

Visualizing the Upgrade

To verify that the new scoring logic actually works, I upgraded the Visualizer Dashboard. It now renders a real-time graph where you can see how the Hybrid Score connects your query to specific memory nodes.

Cyberpunk Visualizer

(It also looks pretty cool in dark mode).

Try the Update

The code is open source and available on GitHub. The repository now includes a docker-compose file that spins up the API, Postgres (with vector/text extensions), and the frontend in one go.

GitHub logo jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault

A Memory Server for AI Agents. Runs on Postgres + pgvector. Now supporting 100% Local/Offline execution via Ollama.

NPM Version License: MIT

I got tired of setting up Pinecone/Weaviate and writing the same embedding boilerplate for every small AI agent I built.

I wanted something that:

  1. Just runs on PostgreSQL (which I already use).
  2. Handles the chunking & embedding automatically.
  3. Lets me visualize the retrieval process (because debugging vector similarity in JSON logs is difficult).
  4. Can run offline without API bills.

So I built MemVault. It is a Node.js wrapper around pgvector with a generic Hybrid Search engine.


Quick Start: Choose your setup

You can run this entirely on your own machine (Docker), or use the managed API to skip the server maintenance.

Feature Self-Hosted (Docker) Managed API (RapidAPI)
Price Free (Open Source) Free Tier available
Embeddings Ollama (Local) or OpenAI OpenAI (Managed)
Setup Time ~15 mins 30 seconds
Data Privacy 100% on

Links

I'd love to hear your thoughts on the BM25 implementation. Combining ts_rank normalization with Cosine Similarity was a fun challenge!

Top comments (0)