DEV Community

Cover image for Vector Search is not enough: Why I added BM25 (Hybrid Search) to my AI Memory Server
NorthernDev
NorthernDev

Posted on

Vector Search is not enough: Why I added BM25 (Hybrid Search) to my AI Memory Server

Last week, I launched MemVault, an open-source memory layer for AI agents built on PostgreSQL and pgvector.

The response was amazing (thanks for the stars and forks!), but as more developers started testing it, a critical architectural flaw became apparent.

The Problem: Vectors are "Fuzzy"

In my initial design, I relied 100% on Cosine Similarity.

  • Conceptually: It works great. "Apple" matches "Fruit".
  • Practically: It fails on specifics. If an agent needs to recall "Error Code 503", vector search might retrieve "Error Code 404" because they are semantically identical (server errors), even though I needed the exact match.

I realized that for a production-grade memory system, fuzzy matching isn't enough. You need exact precision for IDs, names, and technical terms.

The Solution: Hybrid Search 2.0

Based on this feedback (and a deep architectural review), I spent the weekend refactoring the retrieval engine.

I moved from a pure vector approach to a Hybrid Search model that runs entirely inside PostgreSQL. It now calculates a weighted score based on three factors:

  1. Semantic (Vector): Uses pgvector to understand the meaning.
  2. Exact Match (Keyword): Uses PostgreSQL's native tsvector (BM25) to find exact keywords.
  3. Recency (Time): Uses a decay function to prioritize fresh memories.

The Scoring Formula

The SQL query now computes a composite score on the fly:

FinalScore = (VectorScore * 0.5) + (KeywordScore * 0.3) + (RecencyScore * 0.2)
Enter fullscreen mode Exit fullscreen mode

This ensures that if you search for a specific User ID, the Keyword Score spikes and overrides any "fuzzy" semantic matches.

Going 100% Offline (Ollama Support)

The other major request was removing the dependency on OpenAI.

I refactored the backend to use a Provider Pattern for embeddings. By changing a single environment variable, you can now swap out OpenAI for a local Ollama instance running nomic-embed-text.

// src/services/embeddings/index.ts
export function getEmbeddingProvider() {
  switch (process.env.EMBEDDING_PROVIDER) {
    case 'openai': return new OpenAIEmbeddingProvider();
    case 'ollama': return new OllamaProvider(); // 100% Local & Free
  }
}
Enter fullscreen mode Exit fullscreen mode

This means you can now deploy the entire stack—Database, API, and Inference—on your own hardware, completely air-gapped.

Visualizing the Upgrade

To verify that the new scoring logic actually works, I upgraded the Visualizer Dashboard. It now renders a real-time graph where you can see how the Hybrid Score connects your query to specific memory nodes.

Cyberpunk Visualizer

(It also looks pretty cool in dark mode).

Try the Update

The code is open source and available on GitHub. The repository now includes a docker-compose file that spins up the API, Postgres (with vector/text extensions), and the frontend in one go.

GitHub logo jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault: The Intelligent Memory Layer for AI Agents

Build Status License NPM Version Marketplace

Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.

MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.

Start 7-Day Free Trial | Read Documentation | NPM SDK


Why MemVault?

Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.

The "Sleep Cycle" Engine (Unique Feature)

Just like the biological brain, MemVault consolidates information asynchronously.

  • Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
  • Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…

Links

I'd love to hear your thoughts on the BM25 implementation. Combining ts_rank normalization with Cosine Similarity was a fun challenge!

Top comments (0)