Daniel Nwaneri

Posted on Jan 13

My $5/month RAG System Now Beats $200 Solutions - Hybrid Search, Reranking & Dashboard

#architecture #cloudflare #database #opensource

Last month, I shared how I built a production RAG system for $5/month. The response was incredible but the feedback was clear: vector search alone isn't enough.

So I rebuilt it. Here's what changed.

The Problem With V1

My original system used pure vector similarity search. It worked great for semantic queries like "how does edge computing work?" but failed miserably when users searched for:

Exact IDs: FPL-2026-X
Specific names: John Smith contract
Technical terms: bge-small-en-v1.5

Vector search finds meaning, not exact matches. I needed both.

The Upgrade: 5 Features That Changed Everything

1. Hybrid Search (Vector + BM25)

Instead of choosing between semantic and keyword search, I combined them using Reciprocal Rank Fusion (RRF).

Query
  │
  ├──► Vector Search (Vectorize) ──┐
  │    finds: meaning, concepts    │
  │                                ├──► RRF Fusion ──► Results
  └──► Keyword Search (D1 BM25) ───┘
       finds: exact terms, IDs

The magic is in the fusion. RRF doesn't care about raw scores - it cares about rank position. If a document ranks #1 in both searches, it's definitely relevant.

reciprocalRankFusion(vectorResults, keywordResults) {
  const rrfK = 60; // Standard constant
  const scores = new Map();

  vectorResults.forEach((result, rank) => {
    scores.set(result.id, 1 / (rrfK + rank + 1));
  });

  keywordResults.forEach((result, rank) => {
    const existing = scores.get(result.id) || 0;
    scores.set(result.id, existing + 1 / (rrfK + rank + 1));
  });

  return sorted(scores);
}

2. Cross-Encoder Reranking

Vector search finds candidates. But are they actually correct?

I added @cf/baai/bge-reranker-base as a "judge" that evaluates the top 10 results against the original query:

const reranked = await env.AI.run('@cf/baai/bge-reranker-base', {
  query: "cloudflare workers performance",
  contexts: top10Results.map(r => ({ text: r.content }))
});

The reranker doesn't just check similarity - it checks relevance. Game changer.

3. Smart Chunking (15% Overlap)

My V1 chunking was naive - split every 500 characters. This created chunks like:

❌ Bad chunk:

"...the performance was excellent. The system hand"

Now I use recursive chunking that respects semantic boundaries:

✅ Good chunk:

"The performance was excellent. The system handled 
500K requests daily with 99.9% uptime."

Plus 15% overlap ensures context isn't lost between chunks.

4. Interactive Dashboard

No more explaining curl commands to clients.

Dashboard Screenshot

Built as a single HTML file embedded in the Worker - no separate frontend deployment needed:

📊 Real-time stats
📥 Document ingestion form
🔎 Search with latency monitor
🔑 API key management

Access it at /dashboard on any deployment.

5. MCP Integration (AI Agent Ready)

The Model Context Protocol lets AI assistants use your API as a native tool. I added:

GET  /mcp/tools  # List available tools
POST /mcp/call   # Execute a tool

Now Claude Desktop or any MCP-compatible agent can search my knowledge base directly.

Performance: Before vs After

Metric	V1	V2
Search Type	Vector only	Hybrid (Vector + BM25)
Reranking	❌	✅ bge-reranker-base
Chunking	Fixed 500 char	Semantic + 15% overlap
Dashboard	❌	✅ Built-in
MCP Support	❌	✅
Latency	~360ms	~900ms
Accuracy	Good	Significantly better
Cost	$5/month	$5/month

Yes, latency increased - but that's the reranker adding precision. You can disable it for speed-critical queries:

{ "query": "fast search", "rerank": false }

How This Compares to Pinecone

Feature	Pinecone	This Project
Monthly Cost	$50+ minimum	~$5/month
Edge Deployment	❌ Cloud-only	✅ Cloudflare Edge
Hybrid Search	Requires workarounds	✅ Native Vector + BM25
Cross-Encoder Reranking	Basic	✅ bge-reranker-base
MCP Integration	❌ None	✅ Native

Pinecone costs 10-30x more at scale. And according to VentureBeat, they're "struggling with customer churn largely driven by cost concerns."

The accuracy difference? Hybrid search with reranking achieves 66.43% MRR vs 56.72% for semantic-only — a +9.3 percentage point improvement.

Real-World Test: "The Firm Brain" Proof of Concept

I ran four tests simulating high-stakes research scenarios:

1. Semantic Intent

Query: "Can I take a vacation in July?"

Data: HR policy mentioning "paid leave" (not "vacation")

Result: ✅ Matched concept, not just keywords

2. Needle in a Haystack

Query: "What is the limit for the large BGE model?"

Data: Dense technical doc with multiple similar numbers

Result: ✅ Found correct value (1500) among distractors

3. Contextual Logic

Query: "Which team has the unbeaten home record?"

Data: Paragraph mentioning Arsenal (49-game run) and Chelsea (86-game home run)

Result: ✅ Reranker correctly prioritized Chelsea based on "home" qualifier

4. Security

Action: Search with invalid API key

Result: ✅ Hard stop - "Invalid API key"

Performance Summary

Metric	Result
Search Time	341-642ms
Cost	~$5/month
Reranking	Adds precision

The Stack

All on Cloudflare's edge:

Workers - Runtime
Vectorize - Vector database
D1 - SQL for BM25 keywords
Workers AI - Embeddings + Reranking

No external services. No data leaving your account.

Try It Yourself

Live Dashboard: vectorize-mcp-worker.fpl-test.workers.dev/dashboard

GitHub: github.com/dannwaneri/vectorize-mcp-worker

Deploy your own in 10 minutes:

git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
cd vectorize-mcp-worker
npm install
wrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosine
wrangler d1 create mcp-knowledge-db
# Update wrangler.toml with database_id
wrangler d1 execute mcp-knowledge-db --remote --file=./schema.sql
wrangler deploy

Works with mcp-cli

This server is compatible with mcp-cli for efficient tool discovery:

# Add to mcp_servers.json
{
  "mcpServers": {
    "vectorize": {
      "url": "https://your-worker.workers.dev/mcp"
    }
  }
}

# Discover tools
mcp-cli vectorize

# Search your knowledge base
mcp-cli vectorize/search '{"query": "cloudflare workers", "topK": 5}'

What's Next?

Considering:

PDF ingestion (client-side parsing)
Usage analytics dashboard
Batch document upload

But honestly? This covers 90% of RAG use cases. Sometimes "done" is a feature.

Need help deploying this for your team? I offer full-service setup - hire me on Upwork.

⭐ Star the repo if this helped!

Questions? Drop them in the comments.

Top comments (4)

Daniel Nwaneri • Jan 15

@leob checking back in.

Not sure if you started your MCP journey yet, but I just dropped V2 of that RAG stack we talked about.

Since you were looking at the TypeScript path, the repo for this one is way more modular and still runs natively on Workers. It adds Hybrid Search to handle some of the retrieval gaps I found in V1. Might be a good reference if you're still exploring the TS side.

leob • Jan 15

Brilliant work, thanks - high on my to-do list to check it out!

Daniel Nwaneri • Jan 15

Glad it helps. No rush. it’s a lot to dig into. Let me know what you think when you get around to the code. Cheers🍻

Daniel Nwaneri • Jan 13

The real test was finding specific IDs vs general docs. V1 vector search kept mixing up stuff with similar keywords. Adding the reranker finally fixed it .it actually catches the nuance between a 'similar' topic and the 'right' answer.