<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nick Bokuchava</title>
    <description>The latest articles on DEV Community by Nick Bokuchava (@mindmnml).</description>
    <link>https://dev.to/mindmnml</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3900352%2F27e669e7-e064-4200-b752-bf88f0dcf039.png</url>
      <title>DEV Community: Nick Bokuchava</title>
      <link>https://dev.to/mindmnml</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mindmnml"/>
    <language>en</language>
    <item>
      <title>How I caught a silent NaN bug in production RAG, by asking the system to debug itself</title>
      <dc:creator>Nick Bokuchava</dc:creator>
      <pubDate>Mon, 27 Apr 2026 12:01:04 +0000</pubDate>
      <link>https://dev.to/mindmnml/how-i-caught-a-silent-nan-bug-in-production-rag-by-asking-the-system-to-debug-itself-22fm</link>
      <guid>https://dev.to/mindmnml/how-i-caught-a-silent-nan-bug-in-production-rag-by-asking-the-system-to-debug-itself-22fm</guid>
      <description>&lt;p&gt;Last week I built a personal knowledge brain. This week I loaded it with five ML textbooks and asked it to debug itself.&lt;/p&gt;

&lt;p&gt;Here is what happened, and why it matters if you run RAG on Postgres.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I run a Supabase + pgvector RAG called GBrain. Hybrid search: vector + tsvector fused with reciprocal rank fusion, then re-scored with cosine. 2,872 chunks, OpenAI text-embedding-3-small, around 5 cents to ingest. Two AI systems share the same brain over MCP. Claude Code helps me write code interactively, OpenClaw runs background automation on Gemini Flash via Vertex AI.&lt;/p&gt;

&lt;p&gt;The whole thing cost me about $0.10/M tokens to run and roughly an afternoon to wire up. It was working. That was the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The first sign something was off
&lt;/h2&gt;

&lt;p&gt;I dropped in five textbooks (Murphy's Probabilistic Perspective, Bishop's PRML, Chip Huyen's Designing ML Systems, Géron's Hands-On ML, Murphy's Advanced Topics) and started querying. Two things looked wrong.&lt;/p&gt;

&lt;p&gt;Every query was returning &lt;code&gt;NaN&lt;/code&gt; in the relevance score. Not always a hard failure, just a quiet &lt;code&gt;NaN&lt;/code&gt; floating in the rank metadata. The retrieved chunks still came back, ordering still mostly looked sensible, so I almost ignored it.&lt;/p&gt;

&lt;p&gt;Then I asked the system "explain the EM algorithm for Gaussian mixtures" and it missed Murphy chapter 11.4.2. The chapter that is literally about EM for Gaussian mixtures. Top hit was something about variational inference instead.&lt;/p&gt;

&lt;p&gt;Classic RAG failure mode. Wrong chunk wins on a query that should be a layup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Asking the system to audit itself
&lt;/h2&gt;

&lt;p&gt;Before opening the source code I tried something different. I asked Gemini Flash, reading GBrain through MCP, to use the five textbooks to audit its own retrieval quality.&lt;/p&gt;

&lt;p&gt;It came back with surprisingly sharp output. Murphy §9.7.4 quoted verbatim on MRR/NDCG/MAP. Huyen chapter 8 on monitoring and SLO design. And one honest admission: "cross-encoder reranking is not in the corpus." Which is true, because Huyen's book is from 2022 and cross-encoders went mainstream in 2023.&lt;/p&gt;

&lt;p&gt;But the audit also confirmed action item #1: fix the &lt;code&gt;NaN&lt;/code&gt;. It guessed an RRF division bug.&lt;/p&gt;

&lt;p&gt;It guessed wrong. I went to look.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual bug
&lt;/h2&gt;

&lt;p&gt;In the search module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Classic TypeScript trap. &lt;code&gt;as Float32Array&lt;/code&gt; is compile-time only. At runtime, in my setup (Supabase JS client, default config without a custom type parser registered for pgvector), the Postgres client returns the pgvector column as a string, formatted like &lt;code&gt;"[0.1, 0.2, 0.3, ...]"&lt;/code&gt;. Whether you hit this depends on your driver and config — raw &lt;code&gt;pg&lt;/code&gt;, Drizzle with pgvector typing, or a custom type parser will all behave differently. But &lt;code&gt;"returns as string"&lt;/code&gt; is the default for a lot of common Supabase setups.&lt;/p&gt;

&lt;p&gt;So cosine similarity was running over what TypeScript believed was a &lt;code&gt;Float32Array&lt;/code&gt; but was actually a string. JavaScript happily multiplies string characters by numbers in some paths and silently produces &lt;code&gt;NaN&lt;/code&gt; in others. The result moved through the pipeline, got blended with the BM25 score, and poisoned the final ranking, but never hard-crashed anywhere.&lt;/p&gt;

&lt;p&gt;This is the kind of bug that compiles, passes existing tests on happy-path inputs, and just quietly degrades retrieval quality forever. You only catch it when you have ground truth (the textbooks), a clear expected hit (Murphy §11.4.2), and you actually go look.&lt;/p&gt;

&lt;h2&gt;
  
  
  The patch and the second-order problem
&lt;/h2&gt;

&lt;p&gt;The cast bug itself was fixable in one line at the data-ingest boundary, parsing the string back into a real &lt;code&gt;Float32Array&lt;/code&gt;. That fix landed upstream as gbrain #196.&lt;/p&gt;

&lt;p&gt;But it left a question. &lt;code&gt;cosineSimilarity&lt;/code&gt; is still a public export. Future embedding models at different dimensions, direct callers from user code, test fixtures, none of them go through the parse boundary. Same NaN-shaped failure could come back from a different direction.&lt;/p&gt;

&lt;p&gt;So I wrote a separate, narrow defensive hardening of &lt;code&gt;cosineSimilarity&lt;/code&gt; itself. Five lines added, two changed, no API change, no behavior change on valid finite dim-matched vectors. Same scores as before for inputs that were already correct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;cosineSimilarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Float32Array&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;magA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;magB&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// dim-mismatch safe&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="nx"&gt;magA&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="nx"&gt;magB&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;denom&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;magA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;magB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isFinite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;denom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;denom&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Infinity/NaN safe&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;denom&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isFinite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// belt and suspenders&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three failure modes this prevents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dimension mismatch.&lt;/strong&gt; Caller passes a 768-dim vector to a brain storing 1536-dim, the old loop ran past &lt;code&gt;b&lt;/code&gt;'s end, multiplying &lt;code&gt;undefined * undefined = NaN&lt;/code&gt;, which poisoned &lt;code&gt;magB&lt;/code&gt; and the return value. Now the loop runs over the common prefix and returns a finite similarity over the shared dimensions. This is a pragmatic defensive choice, not a semantically exact cosine over mismatched-dimension vectors. The "correct" answer for that case is arguably to throw. The goal here is to not poison every downstream score because one caller passed a wrong-dimension input.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-finite denominator.&lt;/strong&gt; If either vector has values large enough that squared sums overflow to &lt;code&gt;Infinity&lt;/code&gt;, then &lt;code&gt;sqrt(Infinity) * sqrt(Infinity) = Infinity&lt;/code&gt;. The old &lt;code&gt;denom === 0&lt;/code&gt; guard misses that, and &lt;code&gt;dot / Infinity&lt;/code&gt; silently returns 0 or &lt;code&gt;NaN&lt;/code&gt; depending on &lt;code&gt;dot&lt;/code&gt;. The explicit &lt;code&gt;Number.isFinite(denom)&lt;/code&gt; check is clear and fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-finite final result.&lt;/strong&gt; Belt-and-suspenders check on &lt;code&gt;dot / denom&lt;/code&gt;. Since &lt;code&gt;cosineSimilarity&lt;/code&gt;'s output feeds directly into the blended score &lt;code&gt;0.7 * rrf + 0.3 * cosine&lt;/code&gt;, a single &lt;code&gt;NaN&lt;/code&gt; propagates through every downstream result the same way the original cast bug did. Better to catch it here too.&lt;/p&gt;

&lt;p&gt;That landed as PR #295 to garrytan/gbrain, currently open with three unit tests covering each guard.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually learned
&lt;/h2&gt;

&lt;p&gt;The bug was boring. The interesting part was the path to finding it.&lt;/p&gt;

&lt;p&gt;Self-improving AI agents need three things. Most setups give them two. They give it knowledge (what it knows) and tools (how it acts). They lock down the third one, which is introspection rights, permission to read its own source code. Teams lock it down because it feels scary. But without it, an agent can point at its own bug and still not fix it. You watch it confabulate around the symptom.&lt;/p&gt;

&lt;p&gt;The other thing: a &lt;code&gt;NaN&lt;/code&gt; in a score column is one of those bugs where every single layer of the system looks fine in isolation. TypeScript compiles. Tests pass. The query returns. The UI renders. The only signal is that retrieval quality is worse than it should be, and "worse than it should be" is invisible without ground truth. Production RAG without an evaluation corpus is a pipeline you cannot debug.&lt;/p&gt;

&lt;p&gt;One thing this patch does not do: it does not log when any of these guards trigger. That is fine for stability — you do not want a &lt;code&gt;NaN&lt;/code&gt; propagating into a blended retrieval score in production. But for evaluation, silently mapping &lt;code&gt;NaN → 0&lt;/code&gt; can hide real bugs from your metrics. If you adopt this pattern in your own code, add a counter for each guard branch so you can see when the defensive code is actually firing.&lt;/p&gt;

&lt;p&gt;If you run hybrid search on pgvector, two specific things to check today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pull a row directly from your DB client and &lt;code&gt;console.log(typeof row.embedding)&lt;/code&gt;. If it's &lt;code&gt;"string"&lt;/code&gt; and your code casts it to &lt;code&gt;Float32Array&lt;/code&gt;, you have this bug.&lt;/li&gt;
&lt;li&gt;Run a query whose correct top hit you know by name. If the right chunk does not come back top-3, treat it as a real signal, not a tuning question.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/garrytan/gbrain" rel="noopener noreferrer"&gt;github.com/garrytan/gbrain&lt;/a&gt; PR with the hardening + tests: &lt;a href="https://github.com/garrytan/gbrain/pull/295" rel="noopener noreferrer"&gt;PR #295&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>postgres</category>
      <category>typescript</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
