<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tim Uy</title>
    <description>The latest articles on DEV Community by Tim Uy (@tofutim).</description>
    <link>https://dev.to/tofutim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3793237%2F5b03cb97-6d4c-44e9-8398-a1cf17d0c4c4.jpeg</url>
      <title>DEV Community: Tim Uy</title>
      <link>https://dev.to/tofutim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tofutim"/>
    <language>en</language>
    <item>
      <title>How we built a hybrid FTS5 + embedding search for code — and why you need both</title>
      <dc:creator>Tim Uy</dc:creator>
      <pubDate>Thu, 26 Feb 2026 05:58:24 +0000</pubDate>
      <link>https://dev.to/tofutim/how-we-built-a-hybrid-fts5-embedding-search-for-code-and-why-you-need-both-4ec2</link>
      <guid>https://dev.to/tofutim/how-we-built-a-hybrid-fts5-embedding-search-for-code-and-why-you-need-both-4ec2</guid>
      <description>&lt;h1&gt;
  
  
  How we built a hybrid FTS5 + embedding search for code — and why you need both
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;srclight is a deep code indexing MCP server — it gives AI agents understanding of your codebase (symbol search, call graphs, git blame, semantic search) in a single &lt;code&gt;pip install&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you're building AI coding assistants, you need search that works two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Keyword search&lt;/strong&gt; — I know the function name, find it now&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search&lt;/strong&gt; — find code that "handles authentication" without knowing the exact term&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most tools pick one. We built both.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with pure keyword search
&lt;/h2&gt;

&lt;p&gt;FTS5 is great for exact matches. But code has naming conventions: &lt;code&gt;calculateTotalPrice&lt;/code&gt;, &lt;code&gt;calculate_total_price&lt;/code&gt;, &lt;code&gt;CalculateTotalPrice&lt;/code&gt;. A single FTS5 index can't handle all of these well.&lt;/p&gt;

&lt;p&gt;And sometimes you don't know the name at all. You want to find "code that validates user input" — that's a concept, not a keyword.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with pure embedding search
&lt;/h2&gt;

&lt;p&gt;Embeddings are great for meaning. But they struggle with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exact symbol names (searching for &lt;code&gt;handleAuth&lt;/code&gt; should find &lt;code&gt;handleAuth&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Substring matches (searching for &lt;code&gt;parse&lt;/code&gt; should find &lt;code&gt;parseJSON&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Short queries (embeddings need context)&lt;/li&gt;
&lt;li&gt;Naming conventions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Our solution: 4 indexes + RRF fusion
&lt;/h2&gt;

&lt;p&gt;We built &lt;strong&gt;three&lt;/strong&gt; FTS5 indexes, each tuned differently:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Symbol names index (unicode61 tokenizer)
&lt;/h3&gt;

&lt;p&gt;Splits on case changes and underscores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;calculateTotalPrice → calculate, Total, Price
handle_user_auth → handle, user, auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches CamelCase, snake_case, and any convention developers throw at it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Source content index (trigram tokenizer)
&lt;/h3&gt;

&lt;p&gt;Indexes every 3-character substring. This catches substring matches even inside words.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Docstrings index (porter stemmer)
&lt;/h3&gt;

&lt;p&gt;Stems words to their roots: "running, ran, runner → run". This makes docstring search actually useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Embeddings (via Ollama)
&lt;/h3&gt;

&lt;p&gt;Semantic vectors for meaning-based matching. We use qwen3-embedding (4096 dims) or nomic-embed-text (768 dims).&lt;/p&gt;

&lt;h2&gt;
  
  
  The secret sauce: Reciprocal Rank Fusion
&lt;/h2&gt;

&lt;p&gt;Here's how we combine them. We run each query against all 4 indexes, get ranked results, then merge using RRF:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RRF_score(d) = Σ 1 / (k + rank(d))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where k = 60 (standard constant).&lt;/p&gt;

&lt;p&gt;A result appearing at rank 1 in FTS5 and rank 2 in embeddings gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FTS5: 1 / (60 + 1) = 0.0164&lt;/li&gt;
&lt;li&gt;Embeddings: 1 / (60 + 2) = 0.0161&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: 0.0325&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A result at rank 10 in embeddings only gets: 1 / (60 + 10) = 0.0143&lt;/p&gt;

&lt;p&gt;This means exact matches can still win even if embeddings also match — and vice versa. You get the best of both worlds.&lt;/p&gt;

&lt;h2&gt;
  
  
  But wait, there's more
&lt;/h2&gt;

&lt;p&gt;We also built:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU vector cache&lt;/strong&gt;: Embeddings loaded to VRAM once (~300ms cold), then ~3ms per query via CuPy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental indexing&lt;/strong&gt;: Only re-index changed symbols (tracked via content hash)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git intelligence&lt;/strong&gt;: Query "what changed recently?" → git blame, hotspots, uncommitted WIP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-repo workspaces&lt;/strong&gt;: SQLite ATTACH+UNION across 10+ repos&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why not just use Elasticsearch?
&lt;/h2&gt;

&lt;p&gt;We wanted something that installs in one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;srclight
srclight index &lt;span class="nt"&gt;--embed&lt;/span&gt; qwen3-embedding
srclight serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No JVM, no Docker, no Redis, no cloud. Your code never leaves your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;We index 13 repos (45K symbols) in a workspace. Claude Code goes from ~20 tool calls per task to about 6 — because it can just ask "who calls this?" instead of grepping 10 times.&lt;/p&gt;

&lt;p&gt;The hybrid search is the key. Keyword matches for precision, embeddings for recall. RRF fusion brings them together.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What search challenges are you running into with AI coding assistants? Drop a comment — I'd love to hear what's blocking you.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
