Last month, I shared how I built a production RAG system for $5/month. The response was incredible but the feedback was clear: vector search alone isn't enough.
So I rebuilt it. Here's what changed.
The Problem With V1
My original system used pure vector similarity search. It worked great for semantic queries like "how does edge computing work?" but failed miserably when users searched for:
- Exact IDs:
FPL-2026-X - Specific names:
John Smith contract - Technical terms:
bge-small-en-v1.5
Vector search finds meaning, not exact matches. I needed both.
The Upgrade: 5 Features That Changed Everything
1. Hybrid Search (Vector + BM25)
Instead of choosing between semantic and keyword search, I combined them using Reciprocal Rank Fusion (RRF).
Query
│
├──► Vector Search (Vectorize) ──┐
│ finds: meaning, concepts │
│ ├──► RRF Fusion ──► Results
└──► Keyword Search (D1 BM25) ───┘
finds: exact terms, IDs
The magic is in the fusion. RRF doesn't care about raw scores - it cares about rank position. If a document ranks #1 in both searches, it's definitely relevant.
reciprocalRankFusion(vectorResults, keywordResults) {
const rrfK = 60; // Standard constant
const scores = new Map();
vectorResults.forEach((result, rank) => {
scores.set(result.id, 1 / (rrfK + rank + 1));
});
keywordResults.forEach((result, rank) => {
const existing = scores.get(result.id) || 0;
scores.set(result.id, existing + 1 / (rrfK + rank + 1));
});
return sorted(scores);
}
2. Cross-Encoder Reranking
Vector search finds candidates. But are they actually correct?
I added @cf/baai/bge-reranker-base as a "judge" that evaluates the top 10 results against the original query:
const reranked = await env.AI.run('@cf/baai/bge-reranker-base', {
query: "cloudflare workers performance",
contexts: top10Results.map(r => ({ text: r.content }))
});
The reranker doesn't just check similarity - it checks relevance. Game changer.
3. Smart Chunking (15% Overlap)
My V1 chunking was naive - split every 500 characters. This created chunks like:
❌ Bad chunk:
"...the performance was excellent. The system hand"
Now I use recursive chunking that respects semantic boundaries:
✅ Good chunk:
"The performance was excellent. The system handled
500K requests daily with 99.9% uptime."
Plus 15% overlap ensures context isn't lost between chunks.
4. Interactive Dashboard
No more explaining curl commands to clients.
Built as a single HTML file embedded in the Worker - no separate frontend deployment needed:
- 📊 Real-time stats
- 📥 Document ingestion form
- 🔎 Search with latency monitor
- 🔑 API key management
Access it at /dashboard on any deployment.
5. MCP Integration (AI Agent Ready)
The Model Context Protocol lets AI assistants use your API as a native tool. I added:
GET /mcp/tools # List available tools
POST /mcp/call # Execute a tool
Now Claude Desktop or any MCP-compatible agent can search my knowledge base directly.
Performance: Before vs After
| Metric | V1 | V2 |
|---|---|---|
| Search Type | Vector only | Hybrid (Vector + BM25) |
| Reranking | ❌ | ✅ bge-reranker-base |
| Chunking | Fixed 500 char | Semantic + 15% overlap |
| Dashboard | ❌ | ✅ Built-in |
| MCP Support | ❌ | ✅ |
| Latency | ~360ms | ~900ms |
| Accuracy | Good | Significantly better |
| Cost | $5/month | $5/month |
Yes, latency increased - but that's the reranker adding precision. You can disable it for speed-critical queries:
{ "query": "fast search", "rerank": false }
How This Compares to Pinecone
| Feature | Pinecone | This Project |
|---|---|---|
| Monthly Cost | $50+ minimum | ~$5/month |
| Edge Deployment | ❌ Cloud-only | ✅ Cloudflare Edge |
| Hybrid Search | Requires workarounds | ✅ Native Vector + BM25 |
| Cross-Encoder Reranking | Basic | ✅ bge-reranker-base |
| MCP Integration | ❌ None | ✅ Native |
Pinecone costs 10-30x more at scale. And according to VentureBeat, they're "struggling with customer churn largely driven by cost concerns."
The accuracy difference? Hybrid search with reranking achieves 66.43% MRR vs 56.72% for semantic-only — a +9.3 percentage point improvement.
Real-World Test: "The Firm Brain" Proof of Concept
I ran four tests simulating high-stakes research scenarios:
1. Semantic Intent
Query: "Can I take a vacation in July?"
Data: HR policy mentioning "paid leave" (not "vacation")
Result: ✅ Matched concept, not just keywords
2. Needle in a Haystack
Query: "What is the limit for the large BGE model?"
Data: Dense technical doc with multiple similar numbers
Result: ✅ Found correct value (1500) among distractors
3. Contextual Logic
Query: "Which team has the unbeaten home record?"
Data: Paragraph mentioning Arsenal (49-game run) and Chelsea (86-game home run)
Result: ✅ Reranker correctly prioritized Chelsea based on "home" qualifier
4. Security
Action: Search with invalid API key
Result: ✅ Hard stop - "Invalid API key"
Performance Summary
| Metric | Result |
|---|---|
| Search Time | 341-642ms |
| Cost | ~$5/month |
| Reranking | Adds precision |
The Stack
All on Cloudflare's edge:
- Workers - Runtime
- Vectorize - Vector database
- D1 - SQL for BM25 keywords
- Workers AI - Embeddings + Reranking
No external services. No data leaving your account.
Try It Yourself
Live Dashboard: vectorize-mcp-worker.fpl-test.workers.dev/dashboard
GitHub: github.com/dannwaneri/vectorize-mcp-worker
Deploy your own in 10 minutes:
git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
cd vectorize-mcp-worker
npm install
wrangler vectorize create mcp-knowledge-base --dimensions=384 --metric=cosine
wrangler d1 create mcp-knowledge-db
# Update wrangler.toml with database_id
wrangler d1 execute mcp-knowledge-db --remote --file=./schema.sql
wrangler deploy
Works with mcp-cli
This server is compatible with mcp-cli for efficient tool discovery:
# Add to mcp_servers.json
{
"mcpServers": {
"vectorize": {
"url": "https://your-worker.workers.dev/mcp"
}
}
}
# Discover tools
mcp-cli vectorize
# Search your knowledge base
mcp-cli vectorize/search '{"query": "cloudflare workers", "topK": 5}'
What's Next?
Considering:
- PDF ingestion (client-side parsing)
- Usage analytics dashboard
- Batch document upload
But honestly? This covers 90% of RAG use cases. Sometimes "done" is a feature.
Need help deploying this for your team? I offer full-service setup - hire me on Upwork.
⭐ Star the repo if this helped!
Questions? Drop them in the comments.

Top comments (2)
The real test was finding specific IDs vs general docs. V1 vector search kept mixing up stuff with similar keywords. Adding the reranker finally fixed it .it actually catches the nuance between a 'similar' topic and the 'right' answer.
@leob checking back in.
Not sure if you started your MCP journey yet, but I just dropped V2 of that RAG stack we talked about.
Since you were looking at the TypeScript path, the repo for this one is way more modular and still runs natively on Workers. It adds Hybrid Search to handle some of the retrieval gaps I found in V1. Might be a good reference if you're still exploring the TS side.