Amithu Uysen

Posted on Apr 7

Cursor charges $20/mo. Copilot uploads your code. I built a free, local alternative.

#mcp #opensource #python #showdev

I Reverse-Engineered Cursor's Codebase Search and Built an Open-Source Alternative

TL;DR: I built CodeContext, an open-source Python MCP server that replicates the hybrid search architecture Cursor IDE uses internally — combining FAISS vector search, BM25 keyword search, Merkle tree sync, and AST-aware chunking. It works with VS Code Copilot, Claude Desktop, or any MCP client. No paid subscription required.

Let's Be Honest About What You're Paying For

First, let me be clear: Cursor's $20/month isn't just for codebase search. You're paying for an entire IDE experience — AI chat, tab completion, agent mode, cloud agents, frontier model access, and the search indexing that powers all of it. The search is one piece of a much larger product.

Similarly, GitHub Copilot's value isn't just indexing — it's code completion, agent mode, PR reviews, and deep GitHub integration.

So why did I build just the search piece? Because search quality is the foundation that determines how good everything else works. When the AI can't find the right code, every other feature suffers — completions are wrong, chat hallucinates, agent mode edits the wrong file.

And for many developers, the search is the missing piece. You might already have a Copilot Free or Pro plan that works great — except when you ask about code in a 20K-file codebase and it falls back to grep because the local index caps at ~2,500 files.

The Real Cost Picture

Here's what the AI coding landscape actually costs in 2026:

Individual Plans

Tool	Plan	Price	What You Get
GitHub Copilot	Free	$0/mo	50 agent/chat requests, 2K completions, GPT-5 mini
GitHub Copilot	Pro	$10/mo	Unlimited completions, 300 premium requests, Claude/Codex
GitHub Copilot	Pro+	$39/mo	5× premium requests, all models including Opus 4.6
Cursor	Hobby	$0/mo	Limited agent requests, limited tab completions
Cursor	Pro	$20/mo	Extended agent limits, frontier models, MCPs, cloud agents
Cursor	Pro+	$60/mo	3× usage on all models
Cursor	Ultra	$200/mo	20× usage, priority features

Team/Enterprise Plans

Tool	Plan	Price
Cursor	Teams	$40/user/mo (+ SSO, analytics, privacy controls)
Cursor	Enterprise	Custom pricing
Copilot	Business	Contact sales

The Cost Gap

For a team of 10 developers:

Copilot Pro: $100/month ($10 × 10)
Cursor Pro: $200/month ($20 × 10)
Cursor Teams: $400/month ($40 × 10)

Annual difference: $1,200–$3,600 — just between Copilot Pro and Cursor Pro/Teams. And neither price includes the LLM API costs you might incur on top.

Where CodeContext Fits

CodeContext doesn't replace Cursor or Copilot. It fills a specific gap: bringing Cursor-quality codebase search to tools that don't have it.

Scenario 1: You use Copilot Free/Pro but have a large codebase

You love Copilot's VS Code integration, but your 20K-file enterprise repo exceeds the ~2,500-file local index limit. Remote indexing only works with GitHub.com repos — and yours is on GitLab/Bitbucket/self-hosted.

Before CodeContext: Copilot falls back to grep and file search. Agent mode can't find relevant code across the project.

After CodeContext: Add one MCP server config. Copilot now has hybrid semantic + keyword search across all 20K files via @workspace. No change to your Copilot plan needed.

Scenario 2: You're evaluating Cursor vs Copilot

Cursor's search is genuinely better (12.5% accuracy improvement from hybrid retrieval). But you'd need to switch IDEs, retrain muscle memory, and pay $20/mo minimum.

With CodeContext: Stay in VS Code, keep your Copilot plan, and get the same hybrid search architecture. The $10–$30/month you save per developer can go toward more premium model requests.

Scenario 3: Enterprise team on a budget

10 devs × $40/user/month = $4,800/year just for Cursor Teams. CodeContext is free, runs locally, and gives you the search piece.

Cost reduction: Keep everyone on Copilot Pro ($1,200/year for 10 devs) + CodeContext (free) instead of Cursor Teams ($4,800/year). Save $3,600/year while getting comparable search quality.

Scenario 4: Privacy-sensitive projects

Both Cursor and Copilot send code to external servers for indexing and inference. Copilot's remote index uploads your entire codebase to GitHub's cloud (api.github.com) for processing — community reports indicate ~500MB uploads during indexing, even for private repos. Cursor similarly processes code on their proprietary servers.

For teams working on proprietary code, regulated industries (healthcare, finance, defense), or codebases under NDA, this is a non-starter.

CodeContext runs 100% locally with Ollama — your code never leaves your machine. The embedding model runs on your GPU/CPU, the FAISS index is stored on local disk (~/.context/), and zero bytes are transmitted over the network. The HMAC path obfuscation adds an extra privacy layer.

What CodeContext Does NOT Replace

Let me be honest about the limitations:

Feature	Cursor/Copilot	CodeContext
Code completion (tab)	✅ Real-time, context-aware	❌ Not a completion tool
AI chat	✅ Multi-model, streaming	❌ Not a chat interface
Agent mode	✅ Multi-file editing	❌ Search only — feeds agents via MCP
Cloud agents	✅ (Cursor Pro, Copilot Pro)	❌ Local only
PR code review	✅ (Cursor Bugbot, Copilot)	❌ Not in scope
IDE integration	✅ Deep, native	⚠️ Via MCP protocol (works but less seamless)
Codebase search	✅ Proprietary, optimized	✅ Open-source, comparable architecture
Custom embedding model	✅ (Cursor trains their own)	⚠️ Uses open models (nomic-embed-text)

CodeContext is a search engine, not an IDE. It makes your existing IDE + AI assistant better at finding code. That's it.

The Problems (Being Honest)

First-index time on huge codebases — Indexing 20K files with Ollama takes time (minutes, not seconds). Cursor has optimized proprietary infrastructure. We use pipelining and caching to mitigate this, but first run is still slow.
Embedding quality — Cursor trains a custom embedding model on real coding sessions. We use nomic-embed-text (137M params, open-source). It's good, but not fine-tuned for code search specifically. Voyage AI's voyage-code-3 would be better but costs money.
No tab completion or chat — If search quality is your only pain point, CodeContext solves it. If you want the full AI IDE experience, you need Cursor or Copilot.
MCP overhead — Communication via MCP protocol adds latency compared to Cursor's native in-process search. Typically ~50-100ms per query.
No cloud infrastructure — Cursor can share indexes between team members (SimHash, 92% overlap). CodeContext is local-only right now.
Mac-first optimization — The pipelined engine is optimized for Apple Silicon (M4 Pro). It works on Linux/Windows but hasn't been tuned for those platforms.

The Bottom Line

If you...	Recommendation
Have budget + want the best experience	Cursor Pro ($20/mo) — best-in-class AI IDE
Want great completions + affordable	Copilot Pro ($10/mo) — best value
Need search on large codebases with Copilot	Copilot + CodeContext — fills the gap for free
Have privacy requirements (no cloud)	CodeContext + Ollama — 100% local
Team on budget, need enterprise search	Copilot Pro + CodeContext — save $3,600+/year vs Cursor Teams

The Problem Nobody Talks About

Every AI coding assistant has the same bottleneck: retrieval.

When you ask Copilot "where do we handle authentication?" in a 20,000-file codebase, it needs to find the right files in milliseconds. Get this wrong and the AI hallucinates, suggests fixes in the wrong file, or just says "I don't have enough context."

Cursor's engineering blog revealed their approach: hybrid retrieval with Reciprocal Rank Fusion produces 12.5% better results than either semantic or keyword search alone, and up to 23.5% on large codebases.

So I built it.

Architecture: How Cursor Does It (And How I Replicated It)

Cursor's codebase search isn't magic — it's a well-engineered pipeline. Here's what I reverse-engineered and implemented:

Codebase (on disk)
    │
    ▼
Merkle Tree Sync ─── O(changes) not O(files)
    │
    ▼
Tree-sitter AST Splitter ─── functions, classes, methods
    │
    ▼
┌──────────┐    ┌──────────┐
│  FAISS   │    │  BM25    │
│ (dense)  │    │ (sparse) │
│  cosine  │    │ inverted │
└────┬─────┘    └────┬─────┘
     │               │
     ▼               ▼
  RRF Fusion (k=60) ─── merges both rankings
     │
     ▼
  Cross-Encoder Reranker (optional)
     │
     ▼
  MCP Server (stdio / HTTP)

Why Hybrid > Pure Semantic

Consider three different queries against the same codebase:

Query	FAISS (semantic)	BM25 (keyword)	Hybrid (RRF)
"where do we handle authentication?"	✅ Finds `session.ts` by meaning	❌ Word "authentication" absent	✅
"find all imports of PaymentService"	⚠️ Returns similar but wrong	✅ Exact keyword match	✅
"how does the tax calculation work?"	✅ Good conceptual match	✅ Matches "tax" + "calculation"	✅ Best

Neither approach alone covers all query types. RRF fusion combines them without needing score normalization — FAISS cosine scores and BM25 IDF scores are on completely different scales, but RRF only uses rank positions.

The Performance Problem (And How I Solved It)

The naive pipeline — scan files → split → embed → insert — is painfully slow on large codebases. On a 20,000-file enterprise codebase (Zoho CRM), the first version took forever.

The bottleneck analysis:

File I/O + AST splitting is CPU-bound (Tree-sitter parsing)
Embedding is GPU/API-bound (waiting for Ollama or OpenAI)
FAISS persistence is disk I/O-bound (writing after every batch)

These three are mostly independent — the classic producer/consumer problem.

Solution: Pipelined Indexing Engine

Producer (thread pool, 14 workers)     Consumer (async)
┌──────────────────────────┐          ┌──────────────────────┐
│ Read files in parallel   │          │ Check embedding cache │
│ AST split via Tree-sitter│  Queue   │ Embed ~4 sub-batches  │
│ Push chunk batches       │────────▶ │   concurrently        │
│                          │ maxsize=4│ Insert into FAISS     │
└──────────────────────────┘          │ Add to BM25 index     │
                                      └──────────────────────┘
                                               │
                                      FAISS persist (once at end)

Key optimizations:

asyncio.Queue pipeline — While batch N is embedding, batch N+1 is being split. CPU and GPU work overlap.
Concurrent embedding sub-batches — Each flush splits into ~4 sub-batches, sent to Ollama in parallel threads. Set OLLAMA_NUM_PARALLEL=4 to saturate your GPU.
Deferred FAISS persistence — One disk write at the end instead of hundreds during indexing.
Embedding cache — SHA-256 content hash → embedding vector. Re-indexing unchanged code costs zero API calls.
Adaptive thread pool — Scales to your CPU cores (14 on M4 Pro, 8 on older machines).

The overlap value directly measures pipeline efficiency — it's the time saved by running splitting and embedding concurrently.

The AST Chunking Approach

Most embedding-based search tools split code at arbitrary character boundaries. This produces chunks that start mid-function and end mid-class — meaningless to both humans and embedding models.

CodeContext uses Tree-sitter to parse code into an AST, then splits at logical boundaries:

# Python: splits at function_definition, class_definition, decorated_definition
# JavaScript: function_declaration, class_declaration, arrow functions
# Go: function_declaration, method_declaration, type_declaration
# ... 9 languages supported

Control flow (if, for, while, try) stays inside its parent function — it's never split into a separate chunk. Gap text (imports, comments between functions) is handled separately. This matches Cursor's documented approach.

Merkle Tree Sync: O(changes) Not O(files)

For a 50,000-file repo where 3 files changed:

Flat scan: Hash all 50K files → compare → O(50K)
Merkle tree: Compare root hash → walk only divergent branches → O(log N + changes)

The Merkle tree is SHA-256 based and directory-aware. Unchanged subtrees are skipped entirely. On a 20K-file codebase, re-indexing after a few file changes takes seconds instead of minutes.

How to Use It

Install

git clone https://github.com/amithuuysen/codebase-context.git
cd codebase-context
uv sync

Run (with local Ollama — no API key needed)

# Start Ollama with parallel embedding
OLLAMA_NUM_PARALLEL=4 ollama serve

# Pull the embedding model (137M params, fast)
ollama pull nomic-embed-text

# Start CodeContext
EMBEDDING_PROVIDER=ollama OLLAMA_MODEL=nomic-embed-text uv run codecontext

Connect to VS Code Copilot

Add to .vscode/mcp.json:

{
  "servers": {
    "codecontext": {
      "command": "uv",
      "args": ["run", "codecontext"],
      "cwd": "/path/to/codebase-context",
      "env": {
        "MCP_TRANSPORT": "stdio",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

Now Copilot's @workspace agent uses CodeContext for semantic search across your entire codebase — no file limit.

Connect to Claude Desktop

{
  "mcpServers": {
    "codecontext": {
      "command": "uv",
      "args": ["run", "codecontext"],
      "cwd": "/path/to/codebase-context",
      "env": {
        "MCP_TRANSPORT": "stdio",
        "EMBEDDING_PROVIDER": "ollama"
      }
    }
  }
}

Embedding Provider Options

Provider	Model	Speed	Quality	Cost
Ollama (recommended)	`nomic-embed-text` (137M)	Fast	Good	Free, local
OpenAI	`text-embedding-3-small`	Fast	Best	~$0.02/1M tokens
HuggingFace	`all-MiniLM-L6-v2`	Moderate	Good	Free, local

For enterprise codebases (10K+ files), Ollama with nomic-embed-text hits the sweet spot — fast enough for batch indexing, good enough for accurate retrieval, and completely local (no data leaves your machine).

Try It

GitHub: github.com/amithuuysen/codebase-context

If you're hitting Copilot's 2,500-file limit or don't want to pay for Cursor, give it a try. It's open source, runs locally, and works with any MCP-compatible client.

Built with Python, FAISS, Tree-sitter, LlamaIndex, and the MCP protocol. Inspired by Cursor IDE's engineering blog on hybrid search architecture.

DEV Community