I Reverse-Engineered Cursor's Codebase Search and Built an Open-Source Alternative
TL;DR: I built CodeContext, an open-source Python MCP server that replicates the hybrid search architecture Cursor IDE uses internally — combining FAISS vector search, BM25 keyword search, Merkle tree sync, and AST-aware chunking. It works with VS Code Copilot, Claude Desktop, or any MCP client. No paid subscription required.
Let's Be Honest About What You're Paying For
First, let me be clear: Cursor's $20/month isn't just for codebase search. You're paying for an entire IDE experience — AI chat, tab completion, agent mode, cloud agents, frontier model access, and the search indexing that powers all of it. The search is one piece of a much larger product.
Similarly, GitHub Copilot's value isn't just indexing — it's code completion, agent mode, PR reviews, and deep GitHub integration.
So why did I build just the search piece? Because search quality is the foundation that determines how good everything else works. When the AI can't find the right code, every other feature suffers — completions are wrong, chat hallucinates, agent mode edits the wrong file.
And for many developers, the search is the missing piece. You might already have a Copilot Free or Pro plan that works great — except when you ask about code in a 20K-file codebase and it falls back to grep because the local index caps at ~2,500 files.
The Real Cost Picture
Here's what the AI coding landscape actually costs in 2026:
Individual Plans
| Tool | Plan | Price | What You Get |
|---|---|---|---|
| GitHub Copilot | Free | $0/mo | 50 agent/chat requests, 2K completions, GPT-5 mini |
| GitHub Copilot | Pro | $10/mo | Unlimited completions, 300 premium requests, Claude/Codex |
| GitHub Copilot | Pro+ | $39/mo | 5× premium requests, all models including Opus 4.6 |
| Cursor | Hobby | $0/mo | Limited agent requests, limited tab completions |
| Cursor | Pro | $20/mo | Extended agent limits, frontier models, MCPs, cloud agents |
| Cursor | Pro+ | $60/mo | 3× usage on all models |
| Cursor | Ultra | $200/mo | 20× usage, priority features |
Team/Enterprise Plans
| Tool | Plan | Price |
|---|---|---|
| Cursor | Teams | $40/user/mo (+ SSO, analytics, privacy controls) |
| Cursor | Enterprise | Custom pricing |
| Copilot | Business | Contact sales |
The Cost Gap
For a team of 10 developers:
- Copilot Pro: $100/month ($10 × 10)
- Cursor Pro: $200/month ($20 × 10)
- Cursor Teams: $400/month ($40 × 10)
Annual difference: $1,200–$3,600 — just between Copilot Pro and Cursor Pro/Teams. And neither price includes the LLM API costs you might incur on top.
Where CodeContext Fits
CodeContext doesn't replace Cursor or Copilot. It fills a specific gap: bringing Cursor-quality codebase search to tools that don't have it.
Scenario 1: You use Copilot Free/Pro but have a large codebase
You love Copilot's VS Code integration, but your 20K-file enterprise repo exceeds the ~2,500-file local index limit. Remote indexing only works with GitHub.com repos — and yours is on GitLab/Bitbucket/self-hosted.
Before CodeContext: Copilot falls back to grep and file search. Agent mode can't find relevant code across the project.
After CodeContext: Add one MCP server config. Copilot now has hybrid semantic + keyword search across all 20K files via @workspace. No change to your Copilot plan needed.
Scenario 2: You're evaluating Cursor vs Copilot
Cursor's search is genuinely better (12.5% accuracy improvement from hybrid retrieval). But you'd need to switch IDEs, retrain muscle memory, and pay $20/mo minimum.
With CodeContext: Stay in VS Code, keep your Copilot plan, and get the same hybrid search architecture. The $10–$30/month you save per developer can go toward more premium model requests.
Scenario 3: Enterprise team on a budget
10 devs × $40/user/month = $4,800/year just for Cursor Teams. CodeContext is free, runs locally, and gives you the search piece.
Cost reduction: Keep everyone on Copilot Pro ($1,200/year for 10 devs) + CodeContext (free) instead of Cursor Teams ($4,800/year). Save $3,600/year while getting comparable search quality.
Scenario 4: Privacy-sensitive projects
Both Cursor and Copilot send code to external servers for indexing and inference. Copilot's remote index uploads your entire codebase to GitHub's cloud (api.github.com) for processing — community reports indicate ~500MB uploads during indexing, even for private repos. Cursor similarly processes code on their proprietary servers.
For teams working on proprietary code, regulated industries (healthcare, finance, defense), or codebases under NDA, this is a non-starter.
CodeContext runs 100% locally with Ollama — your code never leaves your machine. The embedding model runs on your GPU/CPU, the FAISS index is stored on local disk (~/.context/), and zero bytes are transmitted over the network. The HMAC path obfuscation adds an extra privacy layer.
What CodeContext Does NOT Replace
Let me be honest about the limitations:
| Feature | Cursor/Copilot | CodeContext |
|---|---|---|
| Code completion (tab) | ✅ Real-time, context-aware | ❌ Not a completion tool |
| AI chat | ✅ Multi-model, streaming | ❌ Not a chat interface |
| Agent mode | ✅ Multi-file editing | ❌ Search only — feeds agents via MCP |
| Cloud agents | ✅ (Cursor Pro, Copilot Pro) | ❌ Local only |
| PR code review | ✅ (Cursor Bugbot, Copilot) | ❌ Not in scope |
| IDE integration | ✅ Deep, native | ⚠️ Via MCP protocol (works but less seamless) |
| Codebase search | ✅ Proprietary, optimized | ✅ Open-source, comparable architecture |
| Custom embedding model | ✅ (Cursor trains their own) | ⚠️ Uses open models (nomic-embed-text) |
CodeContext is a search engine, not an IDE. It makes your existing IDE + AI assistant better at finding code. That's it.
The Problems (Being Honest)
First-index time on huge codebases — Indexing 20K files with Ollama takes time (minutes, not seconds). Cursor has optimized proprietary infrastructure. We use pipelining and caching to mitigate this, but first run is still slow.
Embedding quality — Cursor trains a custom embedding model on real coding sessions. We use
nomic-embed-text(137M params, open-source). It's good, but not fine-tuned for code search specifically. Voyage AI'svoyage-code-3would be better but costs money.No tab completion or chat — If search quality is your only pain point, CodeContext solves it. If you want the full AI IDE experience, you need Cursor or Copilot.
MCP overhead — Communication via MCP protocol adds latency compared to Cursor's native in-process search. Typically ~50-100ms per query.
No cloud infrastructure — Cursor can share indexes between team members (SimHash, 92% overlap). CodeContext is local-only right now.
Mac-first optimization — The pipelined engine is optimized for Apple Silicon (M4 Pro). It works on Linux/Windows but hasn't been tuned for those platforms.
The Bottom Line
| If you... | Recommendation |
|---|---|
| Have budget + want the best experience | Cursor Pro ($20/mo) — best-in-class AI IDE |
| Want great completions + affordable | Copilot Pro ($10/mo) — best value |
| Need search on large codebases with Copilot | Copilot + CodeContext — fills the gap for free |
| Have privacy requirements (no cloud) | CodeContext + Ollama — 100% local |
| Team on budget, need enterprise search | Copilot Pro + CodeContext — save $3,600+/year vs Cursor Teams |
The Problem Nobody Talks About
Every AI coding assistant has the same bottleneck: retrieval.
When you ask Copilot "where do we handle authentication?" in a 20,000-file codebase, it needs to find the right files in milliseconds. Get this wrong and the AI hallucinates, suggests fixes in the wrong file, or just says "I don't have enough context."
Cursor's engineering blog revealed their approach: hybrid retrieval with Reciprocal Rank Fusion produces 12.5% better results than either semantic or keyword search alone, and up to 23.5% on large codebases.
So I built it.
Architecture: How Cursor Does It (And How I Replicated It)
Cursor's codebase search isn't magic — it's a well-engineered pipeline. Here's what I reverse-engineered and implemented:
Codebase (on disk)
│
▼
Merkle Tree Sync ─── O(changes) not O(files)
│
▼
Tree-sitter AST Splitter ─── functions, classes, methods
│
▼
┌──────────┐ ┌──────────┐
│ FAISS │ │ BM25 │
│ (dense) │ │ (sparse) │
│ cosine │ │ inverted │
└────┬─────┘ └────┬─────┘
│ │
▼ ▼
RRF Fusion (k=60) ─── merges both rankings
│
▼
Cross-Encoder Reranker (optional)
│
▼
MCP Server (stdio / HTTP)
Why Hybrid > Pure Semantic
Consider three different queries against the same codebase:
| Query | FAISS (semantic) | BM25 (keyword) | Hybrid (RRF) |
|---|---|---|---|
| "where do we handle authentication?" | ✅ Finds session.ts by meaning |
❌ Word "authentication" absent | ✅ |
| "find all imports of PaymentService" | ⚠️ Returns similar but wrong | ✅ Exact keyword match | ✅ |
| "how does the tax calculation work?" | ✅ Good conceptual match | ✅ Matches "tax" + "calculation" | ✅ Best |
Neither approach alone covers all query types. RRF fusion combines them without needing score normalization — FAISS cosine scores and BM25 IDF scores are on completely different scales, but RRF only uses rank positions.
The Performance Problem (And How I Solved It)
The naive pipeline — scan files → split → embed → insert — is painfully slow on large codebases. On a 20,000-file enterprise codebase (Zoho CRM), the first version took forever.
The bottleneck analysis:
- File I/O + AST splitting is CPU-bound (Tree-sitter parsing)
- Embedding is GPU/API-bound (waiting for Ollama or OpenAI)
- FAISS persistence is disk I/O-bound (writing after every batch)
These three are mostly independent — the classic producer/consumer problem.
Solution: Pipelined Indexing Engine
Producer (thread pool, 14 workers) Consumer (async)
┌──────────────────────────┐ ┌──────────────────────┐
│ Read files in parallel │ │ Check embedding cache │
│ AST split via Tree-sitter│ Queue │ Embed ~4 sub-batches │
│ Push chunk batches │────────▶ │ concurrently │
│ │ maxsize=4│ Insert into FAISS │
└──────────────────────────┘ │ Add to BM25 index │
└──────────────────────┘
│
FAISS persist (once at end)
Key optimizations:
- asyncio.Queue pipeline — While batch N is embedding, batch N+1 is being split. CPU and GPU work overlap.
-
Concurrent embedding sub-batches — Each flush splits into ~4 sub-batches, sent to Ollama in parallel threads. Set
OLLAMA_NUM_PARALLEL=4to saturate your GPU. - Deferred FAISS persistence — One disk write at the end instead of hundreds during indexing.
- Embedding cache — SHA-256 content hash → embedding vector. Re-indexing unchanged code costs zero API calls.
- Adaptive thread pool — Scales to your CPU cores (14 on M4 Pro, 8 on older machines).
The overlap value directly measures pipeline efficiency — it's the time saved by running splitting and embedding concurrently.
The AST Chunking Approach
Most embedding-based search tools split code at arbitrary character boundaries. This produces chunks that start mid-function and end mid-class — meaningless to both humans and embedding models.
CodeContext uses Tree-sitter to parse code into an AST, then splits at logical boundaries:
# Python: splits at function_definition, class_definition, decorated_definition
# JavaScript: function_declaration, class_declaration, arrow functions
# Go: function_declaration, method_declaration, type_declaration
# ... 9 languages supported
Control flow (if, for, while, try) stays inside its parent function — it's never split into a separate chunk. Gap text (imports, comments between functions) is handled separately. This matches Cursor's documented approach.
Merkle Tree Sync: O(changes) Not O(files)
For a 50,000-file repo where 3 files changed:
- Flat scan: Hash all 50K files → compare → O(50K)
- Merkle tree: Compare root hash → walk only divergent branches → O(log N + changes)
The Merkle tree is SHA-256 based and directory-aware. Unchanged subtrees are skipped entirely. On a 20K-file codebase, re-indexing after a few file changes takes seconds instead of minutes.
How to Use It
Install
git clone https://github.com/amithuuysen/codebase-context.git
cd codebase-context
uv sync
Run (with local Ollama — no API key needed)
# Start Ollama with parallel embedding
OLLAMA_NUM_PARALLEL=4 ollama serve
# Pull the embedding model (137M params, fast)
ollama pull nomic-embed-text
# Start CodeContext
EMBEDDING_PROVIDER=ollama OLLAMA_MODEL=nomic-embed-text uv run codecontext
Connect to VS Code Copilot
Add to .vscode/mcp.json:
{
"servers": {
"codecontext": {
"command": "uv",
"args": ["run", "codecontext"],
"cwd": "/path/to/codebase-context",
"env": {
"MCP_TRANSPORT": "stdio",
"EMBEDDING_PROVIDER": "ollama"
}
}
}
}
Now Copilot's @workspace agent uses CodeContext for semantic search across your entire codebase — no file limit.
Connect to Claude Desktop
{
"mcpServers": {
"codecontext": {
"command": "uv",
"args": ["run", "codecontext"],
"cwd": "/path/to/codebase-context",
"env": {
"MCP_TRANSPORT": "stdio",
"EMBEDDING_PROVIDER": "ollama"
}
}
}
}
Embedding Provider Options
| Provider | Model | Speed | Quality | Cost |
|---|---|---|---|---|
| Ollama (recommended) |
nomic-embed-text (137M) |
Fast | Good | Free, local |
| OpenAI | text-embedding-3-small |
Fast | Best | ~$0.02/1M tokens |
| HuggingFace | all-MiniLM-L6-v2 |
Moderate | Good | Free, local |
For enterprise codebases (10K+ files), Ollama with nomic-embed-text hits the sweet spot — fast enough for batch indexing, good enough for accurate retrieval, and completely local (no data leaves your machine).
Try It
GitHub: github.com/amithuuysen/codebase-context
If you're hitting Copilot's 2,500-file limit or don't want to pay for Cursor, give it a try. It's open source, runs locally, and works with any MCP-compatible client.
Built with Python, FAISS, Tree-sitter, LlamaIndex, and the MCP protocol. Inspired by Cursor IDE's engineering blog on hybrid search architecture.
Top comments (0)