How we built a hybrid FTS5 + embedding search for code — and why you need both
srclight is a deep code indexing MCP server — it gives AI agents understanding of your codebase (symbol search, call graphs, git blame, semantic search) in a single
pip install.
When you're building AI coding assistants, you need search that works two ways:
- Keyword search — I know the function name, find it now
- Semantic search — find code that "handles authentication" without knowing the exact term
Most tools pick one. We built both.
The problem with pure keyword search
FTS5 is great for exact matches. But code has naming conventions: calculateTotalPrice, calculate_total_price, CalculateTotalPrice. A single FTS5 index can't handle all of these well.
And sometimes you don't know the name at all. You want to find "code that validates user input" — that's a concept, not a keyword.
The problem with pure embedding search
Embeddings are great for meaning. But they struggle with:
- Exact symbol names (searching for
handleAuthshould findhandleAuth) - Substring matches (searching for
parseshould findparseJSON) - Short queries (embeddings need context)
- Naming conventions
Our solution: 4 indexes + RRF fusion
We built three FTS5 indexes, each tuned differently:
1. Symbol names index (unicode61 tokenizer)
Splits on case changes and underscores:
calculateTotalPrice → calculate, Total, Price
handle_user_auth → handle, user, auth
This catches CamelCase, snake_case, and any convention developers throw at it.
2. Source content index (trigram tokenizer)
Indexes every 3-character substring. This catches substring matches even inside words.
3. Docstrings index (porter stemmer)
Stems words to their roots: "running, ran, runner → run". This makes docstring search actually useful.
4. Embeddings (via Ollama)
Semantic vectors for meaning-based matching. We use qwen3-embedding (4096 dims) or nomic-embed-text (768 dims).
The secret sauce: Reciprocal Rank Fusion
Here's how we combine them. We run each query against all 4 indexes, get ranked results, then merge using RRF:
RRF_score(d) = Σ 1 / (k + rank(d))
where k = 60 (standard constant).
A result appearing at rank 1 in FTS5 and rank 2 in embeddings gets:
- FTS5: 1 / (60 + 1) = 0.0164
- Embeddings: 1 / (60 + 2) = 0.0161
- Total: 0.0325
A result at rank 10 in embeddings only gets: 1 / (60 + 10) = 0.0143
This means exact matches can still win even if embeddings also match — and vice versa. You get the best of both worlds.
But wait, there's more
We also built:
- GPU vector cache: Embeddings loaded to VRAM once (~300ms cold), then ~3ms per query via CuPy
- Incremental indexing: Only re-index changed symbols (tracked via content hash)
- Git intelligence: Query "what changed recently?" → git blame, hotspots, uncommitted WIP
- Multi-repo workspaces: SQLite ATTACH+UNION across 10+ repos
Why not just use Elasticsearch?
We wanted something that installs in one command:
pip install srclight
srclight index --embed qwen3-embedding
srclight serve
No JVM, no Docker, no Redis, no cloud. Your code never leaves your machine.
Results
We index 13 repos (45K symbols) in a workspace. Claude Code goes from ~20 tool calls per task to about 6 — because it can just ask "who calls this?" instead of grepping 10 times.
The hybrid search is the key. Keyword matches for precision, embeddings for recall. RRF fusion brings them together.
What search challenges are you running into with AI coding assistants? Drop a comment — I'd love to hear what's blocking you.
Top comments (0)