Veles: Hybrid BM25 + Semantic Code Search in a Local Rust MCP Server

#ai #productivity #tutorial #webdev

Your AI coding assistant is only as good as the code it can find. Ask Claude or Cursor to "tighten up the retry logic," and the model does not read your entire repository — it retrieves a handful of files or snippets and reasons from those. If that retrieval step surfaces the wrong code, you get a confident, well-written answer built on the wrong context, and you may not notice until something breaks.

Veles is an open-source Model Context Protocol (MCP) server, written in Rust, that targets exactly this step. Instead of committing to one search strategy, it runs two at once — BM25 keyword ranking and semantic vector search — over a local index of your codebase, then returns the merged result to whatever MCP-compatible assistant you have connected.

Why code retrieval is the weak link

Most AI coding tools fall back on one of two retrieval strategies, and each fails in a predictable way.

Exact-match search — grep, ripgrep, substring lookup — is precise when you already know the token you want. Search for parseAuthHeader and you get every hit. Ask for "the code that validates login tokens" and exact match returns nothing, because your phrasing shares no words with the identifier.

Semantic search flips the problem. It splits your code into chunks, turns each chunk into an embedding vector, and ranks by vector similarity. That handles paraphrase well — "validate login tokens" can match a function named parseAuthHeader. But embedding models are trained to generalize, and they tend to wash out the specificity you often need most: an exact error string, a config key like MAX_RETRIES, a rarely used function name. Ask a pure-embedding index for a literal string and it will return five files that are about the right topic while missing the one line that actually contains it.

An assistant's context window is small next to a real repository. You are not feeding it the whole codebase; you are feeding it the top few retrieved chunks. Precision in that step is the difference between a correct refactor and a plausible-looking wrong one. Worse, the failure is usually silent: the index returns its best guess no matter what, so the assistant proceeds as if it found the right code.

What hybrid BM25 + semantic search buys you

BM25 — the Okapi BM25 ranking function — is the keyword side of Veles. It belongs to the same family of term-frequency scoring that backs the defaults in Lucene and Elasticsearch. It weights rare terms heavily, which is exactly the behavior you want for exact identifiers, string literals, and uncommon tokens. Semantic vector search is the conceptual side, catching matches where your wording and the code's wording diverge.

Run the two separately and you pick your failure mode. Run them together and each covers the other's blind spot: BM25 anchors the result list to exact hits, while the semantic scorer pulls in conceptually related code that shares no literal terms. Fusing two ranked lists into one is a well-documented pattern in search engineering, and it is the core idea Veles applies to code retrieval. The merge step normalizes or rank-fuses the two score lists — reciprocal rank fusion is a common approach — so neither ranker dominates simply because its raw scores sit on a different scale.

Hybrid search works because the two rankers disagree in useful ways. BM25 ranks a chunk on the words it literally contains; the semantic ranker ranks it on what the chunk appears to mean. A result that scores high on both is almost certainly relevant — and a query like "rate limiting" can reach both the function literally named rateLimit and the middleware that throttles requests without ever using that phrase.

Because it speaks MCP, Veles is not tied to one assistant. The Model Context Protocol is an open standard for connecting tools and data sources to LLM clients, so any MCP-compatible assistant — Claude, Cursor, and a growing list of others — can call Veles the same way. You configure the search backend once and keep it when you switch editors.

Running it locally, and what that costs

Two design choices separate Veles from a hosted code-search service: it is local, and it is Rust.

Local means your code is indexed and searched on your own machine. For proprietary codebases, client work under NDA, or anything in a regulated industry, that removes a real blocker — you get embedding-quality retrieval without shipping source to a third-party API. The project describes Veles as keeping code off the cloud, which is the entire point of running the index yourself.

Rust means Veles ships as a native binary with no Python or Node runtime to install and manage. For a process that sits between your editor and every query you make, low overhead and fast indexing are not cosmetic — they decide whether the tool feels instant or laggy. A compiled binary with no garbage collector also keeps memory predictable on a large monorepo.

Semantic search carries a cost the keyword side does not: an indexing pass. The first index of a large repository takes time and disk space, and that index drifts out of date as you commit. Plan for re-indexing, ideally incrementally, and confirm which embedding backend the build uses. If embeddings are generated by a remote API rather than a local model, snippets of your code leave the machine for that step even though search itself is local. Verify this before pointing Veles at a sensitive repo.

Setup follows the standard MCP pattern: add Veles to your client's MCP configuration — the mcpServers block in Claude Desktop or Cursor — point it at the directory you want indexed, and let it build the index. Most clients let you scope the server to specific folders, which is worth doing on a monorepo so the index stays focused. After that, the assistant gains a code-search tool it can call on its own, with no change to how you prompt.

Veles is young and open-source, so adopt it deliberately. Read the configuration docs, check how indexing and refresh are handled, and test retrieval quality on your own repository before you depend on it. What the project gets right is the diagnosis: retrieval, not the model, is often the weak link in an AI coding workflow, and a hybrid local index is a sound way to strengthen it.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.