<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muhammad Zeeshan</title>
    <description>The latest articles on DEV Community by Muhammad Zeeshan (@zeeshan56656).</description>
    <link>https://dev.to/zeeshan56656</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F850602%2F8c0f69fd-99ae-4104-a262-cf60c05724d5.jpeg</url>
      <title>DEV Community: Muhammad Zeeshan</title>
      <link>https://dev.to/zeeshan56656</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/zeeshan56656"/>
    <language>en</language>
    <item>
      <title>Your Node.js RAG is grabbing the wrong sources. Here's a 4MB cross-encoder fix.</title>
      <dc:creator>Muhammad Zeeshan</dc:creator>
      <pubDate>Wed, 27 May 2026 18:12:37 +0000</pubDate>
      <link>https://dev.to/zeeshan56656/your-nodejs-rag-is-grabbing-the-wrong-sources-heres-a-4mb-cross-encoder-fix-1iff</link>
      <guid>https://dev.to/zeeshan56656/your-nodejs-rag-is-grabbing-the-wrong-sources-heres-a-4mb-cross-encoder-fix-1iff</guid>
      <description>&lt;h2&gt;
  
  
  The bug nobody talks about in JS RAG tutorials
&lt;/h2&gt;

&lt;p&gt;You wire up a Node.js retrieval pipeline. Embed your docs with OpenAI or Voyage, drop them into Pinecone or pgvector, build a &lt;code&gt;top-5&lt;/code&gt; query, feed it to Claude or GPT. The demo works.&lt;/p&gt;

&lt;p&gt;Then you ship to production and users notice the same thing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The bot quoted the wrong document."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The retrieval looked fine in your eval. It pulled five plausibly-related chunks. The LLM picked the wrong one to cite because the WRONG ONE was at the top of the list.&lt;/p&gt;

&lt;p&gt;This is the bi-encoder problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why embedding similarity falls short
&lt;/h2&gt;

&lt;p&gt;When you do &lt;code&gt;cosine(query_vec, doc_vec)&lt;/code&gt; on a vector DB, you are using a bi-encoder: the query and the doc were encoded independently into the same space. That is fast (millisecond-scale ANN search over millions of vectors). It is also lossy. The encoder never saw the query-doc pair TOGETHER. It estimated relevance based on independent meaning.&lt;/p&gt;

&lt;p&gt;A cross-encoder sees both at once. It encodes &lt;code&gt;[query, doc]&lt;/code&gt; jointly through a small transformer and outputs a single relevance score per pair. Heavier per call. Massively more accurate at picking the right one.&lt;/p&gt;

&lt;p&gt;The canonical RAG architecture in 2025 is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Bi-encoder plus ANN to fetch top-50 candidates (fast, cheap).&lt;/li&gt;
&lt;li&gt;Cross-encoder reranker to pick the top-5 from those 50.&lt;/li&gt;
&lt;li&gt;Feed top-5 to the LLM.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Python devs have had this since 2023 via FlashRank, sentence-transformers, and BGE rerankers. Node.js devs have had two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pay Cohere Rerank ($1 per 1000 calls, ~300 ms network latency).&lt;/li&gt;
&lt;li&gt;Hand-roll with &lt;code&gt;@huggingface/transformers&lt;/code&gt; (~80 lines of model loading, batching, score normalization plumbing you do not want to maintain).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That gap is what &lt;code&gt;flashrank-js&lt;/code&gt; fills.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;flashrank-js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Zero API keys. Zero cloud. ONNX cross-encoder running locally via &lt;code&gt;@huggingface/transformers&lt;/code&gt;. Five model tiers from 4 MB to 280 MB. Pick the one that fits your latency budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  The six-line tutorial
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Reranker&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;flashrank-js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reranker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What is RAG?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// your top-50 from vector search&lt;/span&gt;
  &lt;span class="na"&gt;topN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ranked[0]&lt;/code&gt; is the most relevant document. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before / after on a real query
&lt;/h2&gt;

&lt;p&gt;Suppose your vector store returns these five candidates for the query "How does retrieval-augmented generation work?":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;RAG combines a retriever and a generator. The retriever finds relevant docs, the generator uses them to answer.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Karachi is a city in Pakistan with a population over 16 million.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Retrieval-augmented generation grounds LLM outputs in real documents to reduce hallucinations.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The capital of France is Paris.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cross-encoders rerank retrieved documents to surface the most relevant ones for a query.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector similarity might put them in this order (depends on your embedding model, but the city and capital snippets often sneak in because they share token statistics with the query):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0.81  RAG combines a retriever and a generator...
0.78  Karachi is a city in Pakistan...      (noise)
0.76  Retrieval-augmented generation grounds...
0.71  The capital of France is Paris...     (noise)
0.69  Cross-encoders rerank retrieved...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two of five are noise. The LLM picks one of them in its citation, your user sees a hallucination.&lt;/p&gt;

&lt;p&gt;Run the same candidates through &lt;code&gt;flashrank-js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;How does retrieval-augmented generation work?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;topN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[0.9982] Retrieval-augmented generation grounds LLM outputs in real documents...
[0.0005] Cross-encoders rerank retrieved documents to surface the most relevant ones...
[0.0004] RAG combines a retriever and a generator...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three RAG-related docs at top. Karachi and Paris dropped out entirely.&lt;/p&gt;

&lt;p&gt;The LLM that sees this top-3 cannot hallucinate a city citation. There is no city in the context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters in production
&lt;/h2&gt;

&lt;p&gt;I have been shipping RAG pipelines and AI agent workflows in production for a while. The Python side of those stacks has had FlashRank wired in for years. The JavaScript pieces shipping to the client UI did not. Every new project that needed cross-encoder reranking on the client started with the same 80 lines of &lt;code&gt;@huggingface/transformers&lt;/code&gt; boilerplate. That boilerplate ages because transformers.js refactors its API every minor release.&lt;/p&gt;

&lt;p&gt;So I packaged the boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five model tiers
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;flashrank-js&lt;/code&gt; ships with four pre-configured cross-encoder models. The fifth is "bring your own ONNX repo from Hugging Face Hub". Pick by latency budget:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Alias&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4 MB&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Lowest latency, edge runtimes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;mini&lt;/code&gt; (default)&lt;/td&gt;
&lt;td&gt;23 MB&lt;/td&gt;
&lt;td&gt;English&lt;/td&gt;
&lt;td&gt;Balanced English RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bge-base&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;280 MB&lt;/td&gt;
&lt;td&gt;Multilingual&lt;/td&gt;
&lt;td&gt;First multilingual tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bge-v2-m3&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;571 MB&lt;/td&gt;
&lt;td&gt;Multilingual&lt;/td&gt;
&lt;td&gt;2025-2026 SOTA-small&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bge-large&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;563 MB&lt;/td&gt;
&lt;td&gt;Multilingual&lt;/td&gt;
&lt;td&gt;Max quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Switching is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reranker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;Reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bge-v2-m3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Models download from Hugging Face Hub on first call, cache locally. After that the call is pure inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real benchmarks (Windows x64, Node 24, CPU)
&lt;/h2&gt;

&lt;p&gt;End-to-end including tokenization, median of five runs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Load (first call)&lt;/th&gt;
&lt;th&gt;5 docs&lt;/th&gt;
&lt;th&gt;10 docs&lt;/th&gt;
&lt;th&gt;20 docs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;190 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3 ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6 ms&lt;/td&gt;
&lt;td&gt;10 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mini&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;260 ms&lt;/td&gt;
&lt;td&gt;37 ms&lt;/td&gt;
&lt;td&gt;68 ms&lt;/td&gt;
&lt;td&gt;97 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Cohere Rerank API for comparison: 200 to 500 ms including network round-trip, $1 per 1000 calls. At 1 million queries per month, that is $1,000 in rerank fees that can be zero with &lt;code&gt;flashrank-js&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vercel AI SDK style
&lt;/h2&gt;

&lt;p&gt;If your stack uses Vercel AI SDK 6's &lt;code&gt;rerank()&lt;/code&gt;, the call shape mirrors it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;rerank&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;flashrank-js/vercel-ai-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ranking&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;topN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;{ index, relevanceScore }&lt;/code&gt; shape Vercel's API returns. Swapping from the paid Cohere provider to local &lt;code&gt;flashrank&lt;/code&gt; is a one-line import change.&lt;/p&gt;

&lt;p&gt;(Heads up: this is a standalone function with a familiar shape, not a &lt;code&gt;RerankingModelV2&lt;/code&gt; provider you pass into Vercel's &lt;code&gt;rerank()&lt;/code&gt; from the &lt;code&gt;ai&lt;/code&gt; package. A true provider adapter is on the v1.x roadmap.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limits
&lt;/h2&gt;

&lt;p&gt;This is not a silver bullet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-encoders are SLOWER per call than bi-encoders. You still need a vector store for the first-stage retrieval. Reranking happens on top-50, not on millions.&lt;/li&gt;
&lt;li&gt;Multilingual cross-encoders are bigger. &lt;code&gt;bge-v2-m3&lt;/code&gt; is 571 MB. For pure English apps, &lt;code&gt;mini&lt;/code&gt; at 23 MB is the sweet spot.&lt;/li&gt;
&lt;li&gt;Pure-edge runtimes (Cloudflare Workers, Vercel Edge) need the bundled WASM build of &lt;code&gt;onnxruntime-web&lt;/code&gt;. v0.1 targets Node.js 20+ first; edge runtime story lands in v1.1.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;flashrank-js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;npm: &lt;a href="https://www.npmjs.com/package/flashrank-js" rel="noopener noreferrer"&gt;https://www.npmjs.com/package/flashrank-js&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/zeeshan56656/flashrank-js" rel="noopener noreferrer"&gt;https://github.com/zeeshan56656/flashrank-js&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Issues and PRs welcome.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ship a Node.js RAG and your users have ever complained that "the bot quoted the wrong thing," try this. The 23 MB default model is around zero dollars to add and around 30 minutes to integrate. Citation accuracy goes up.&lt;/p&gt;

&lt;p&gt;I would love your bug reports.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>typescript</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
