Skip to content

DEV Community

NorthernDev

Posted on Dec 2, 2025

Vector Search is not enough: Why I added BM25 (Hybrid Search) to my AI Memory Server

#postgres #node #opensource #ai

Last week, I launched MemVault, an open-source memory layer for AI agents built on PostgreSQL and pgvector.

The response was amazing (thanks for the stars and forks!), but as more developers started testing it, a critical architectural flaw became apparent.

The Problem: Vectors are "Fuzzy"

In my initial design, I relied 100% on Cosine Similarity.

Conceptually: It works great. "Apple" matches "Fruit".
Practically: It fails on specifics. If an agent needs to recall "Error Code 503", vector search might retrieve "Error Code 404" because they are semantically identical (server errors), even though I needed the exact match.

I realized that for a production-grade memory system, fuzzy matching isn't enough. You need exact precision for IDs, names, and technical terms.

The Solution: Hybrid Search 2.0

Based on this feedback (and a deep architectural review), I spent the weekend refactoring the retrieval engine.

I moved from a pure vector approach to a Hybrid Search model that runs entirely inside PostgreSQL. It now calculates a weighted score based on three factors:

Semantic (Vector): Uses pgvector to understand the meaning.
Exact Match (Keyword): Uses PostgreSQL's native tsvector (BM25) to find exact keywords.
Recency (Time): Uses a decay function to prioritize fresh memories.

The Scoring Formula

The SQL query now computes a composite score on the fly:

FinalScore = (VectorScore * 0.5) + (KeywordScore * 0.3) + (RecencyScore * 0.2)

This ensures that if you search for a specific User ID, the Keyword Score spikes and overrides any "fuzzy" semantic matches.

Going 100% Offline (Ollama Support)

The other major request was removing the dependency on OpenAI.

I refactored the backend to use a Provider Pattern for embeddings. By changing a single environment variable, you can now swap out OpenAI for a local Ollama instance running nomic-embed-text.

// src/services/embeddings/index.ts
export function getEmbeddingProvider() {
  switch (process.env.EMBEDDING_PROVIDER) {
    case 'openai': return new OpenAIEmbeddingProvider();
    case 'ollama': return new OllamaProvider(); // 100% Local & Free
  }
}

This means you can now deploy the entire stack—Database, API, and Inference—on your own hardware, completely air-gapped.

Visualizing the Upgrade

To verify that the new scoring logic actually works, I upgraded the Visualizer Dashboard. It now renders a real-time graph where you can see how the Hybrid Score connects your query to specific memory nodes.

(It also looks pretty cool in dark mode).

Try the Update

The code is open source and available on GitHub. The repository now includes a docker-compose file that spins up the API, Postgres (with vector/text extensions), and the frontend in one go.

jakops88-hub / Long-Term-Memory-API

Production-grade API to give your AI agents long-term memory without the boilerplate.

MemVault: The Intelligent Memory Layer for AI Agents

Give your LLMs long-term memory, semantic understanding, and evolving context—with one line of code.

MemVault is a production-grade GraphRAG (Graph Retrieval-Augmented Generation) platform. Unlike simple vector databases that only find "similar words", MemVault builds a dynamic knowledge graph of entities and relationships, allowing your AI to understand context, not just keywords.

Start 7-Day Free Trial | Read Documentation | NPM SDK

Why MemVault?

Building persistent memory is hard. Managing vector databases, embedding pipelines, graph databases, and context windows is even harder. MemVault solves this with a managed API that acts as the hippocampus for your AI agents.

The "Sleep Cycle" Engine (Unique Feature)

Just like the biological brain, MemVault consolidates information asynchronously.

Ingest Now, Process Later: We accept data instantly, but deep processing happens in the background.
Auto-Consolidation: Every 6 hours, our Sleep Cycle Engine wakes up to merge duplicate entities…

Links

Live Demo: https://memvault-demo-g38n.vercel.app/
NPM Package: memvault-sdk-jakops88

I'd love to hear your thoughts on the BM25 implementation. Combining ts_rank normalization with Cosine Similarity was a fun challenge!

Top comments (0)

Subscribe

Senior Developer advocating for The Boring Stack. Building Sigilla to fix knowledge management. Writing about the intersection of AI and software craftsmanship. Contact: nordicsecures@proton.me

Location

North Sweden
Education

Computer Science and Systems Design. Focused on building reliable, long-term software.
Work

Senior Freelance Developer & Founder of Sigilla. Focus on pragmatic architecture and backend scale.
Joined

Nov 26, 2025

The Junior Developer is Extinct (And we are creating a disaster)

#career #ai #discuss #future

Christmas thoughts: Why Senior Engineers hate "cool" new frameworks (and why you should too).

#career #webdev #productivity #ai

Stop treating your AI like a goldfish. Here is how to give it long-term memory.

#ai #webdev #architecture #javascript