<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sujithkrishnan.p.k</title>
    <description>The latest articles on DEV Community by Sujithkrishnan.p.k (@sujithkrishnanpk_c9f931).</description>
    <link>https://dev.to/sujithkrishnanpk_c9f931</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3958163%2Fb0b24336-f641-4f98-be2a-d1117cd94821.png</url>
      <title>DEV Community: Sujithkrishnan.p.k</title>
      <link>https://dev.to/sujithkrishnanpk_c9f931</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sujithkrishnanpk_c9f931"/>
    <language>en</language>
    <item>
      <title>I built a docs Q&amp;A engine that returns null instead of hallucinating</title>
      <dc:creator>Sujithkrishnan.p.k</dc:creator>
      <pubDate>Fri, 29 May 2026 09:36:01 +0000</pubDate>
      <link>https://dev.to/sujithkrishnanpk_c9f931/i-built-a-docs-qa-engine-that-returns-null-instead-of-hallucinating-58p6</link>
      <guid>https://dev.to/sujithkrishnanpk_c9f931/i-built-a-docs-qa-engine-that-returns-null-instead-of-hallucinating-58p6</guid>
      <description>&lt;p&gt;Every "docs chatbot" today routes user questions through OpenAI. For&lt;br&gt;
open-source maintainers, privacy-conscious teams, and air-gapped&lt;br&gt;
environments, that's either too expensive or unacceptable. So I built&lt;br&gt;
one that doesn't.&lt;/p&gt;
&lt;h2&gt;
  
  
  The product
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/teamerisingstars/KB-API" rel="noopener noreferrer"&gt;Knowledge Base API&lt;/a&gt; is a&lt;br&gt;
small FastAPI service that answers questions over a folder of markdown&lt;br&gt;
files using &lt;strong&gt;BM25 + POS-aware lemmatization + WordNet synonym&lt;br&gt;
expansion&lt;/strong&gt;. No models. No API keys. No data leaving the box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kb-api-q30f.onrender.com" rel="noopener noreferrer"&gt;Live demo against FastAPI + Pydantic + Starlette docs&lt;/a&gt;&lt;br&gt;
(2,869 sections, 265 files).&lt;/p&gt;
&lt;h2&gt;
  
  
  The unusual constraint
&lt;/h2&gt;

&lt;p&gt;The single hardest behaviour to enforce was making the API return&lt;br&gt;
&lt;code&gt;null&lt;/code&gt; instead of inventing an answer when nothing in the corpus is&lt;br&gt;
a real fit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://kb-api-q30f.onrender.com/ask &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"question":"what is quantum chromodynamics"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"section"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I don't have enough information to answer that."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most retrieval systems silently return the least-bad section. The&lt;br&gt;
trade-off — sometimes refusing to answer — is the whole point.&lt;/p&gt;
&lt;h2&gt;
  
  
  Three things that took longer than I expected
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Identifier-aware tokenization
&lt;/h3&gt;

&lt;p&gt;The default NLTK tokenizer keeps &lt;code&gt;response_model&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;OAuth2PasswordBearer&lt;/code&gt;, and &lt;code&gt;Cross-Origin&lt;/code&gt; as single opaque tokens.&lt;br&gt;
That means a query for "what is response_model" never matches because&lt;br&gt;
the document body has &lt;code&gt;response_model&lt;/code&gt; underscored and the lemmatized&lt;br&gt;
query doesn't.&lt;/p&gt;

&lt;p&gt;Solution: split on &lt;code&gt;_&lt;/code&gt;, &lt;code&gt;-&lt;/code&gt;, and CamelCase boundaries before&lt;br&gt;
lemmatization, and keep BOTH the full identifier and its pieces in the&lt;br&gt;
indexed token stream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;split_identifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OAuth2PasswordBearer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; ["OAuth2PasswordBearer", "OAuth2", "Password", "Bearer"]
&lt;/span&gt;
&lt;span class="nf"&gt;split_identifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cross-Origin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# -&amp;gt; ["Cross-Origin", "Cross", "Origin"]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Going from 50% to 90% accuracy on identifier-heavy queries was almost&lt;br&gt;
entirely this fix.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Query-side acronym expansion (without polluting the index)
&lt;/h3&gt;

&lt;p&gt;If you expand &lt;code&gt;CORS&lt;/code&gt; to &lt;code&gt;cross origin resource sharing&lt;/code&gt; at index time,&lt;br&gt;
every BM25 IDF calculation breaks — terms appear artificially often,&lt;br&gt;
document lengths inflate, scoring degrades.&lt;/p&gt;

&lt;p&gt;The right move is &lt;strong&gt;query-side only&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_ACRONYMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross origin resource sharing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jwt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json web token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application programming interface&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csrf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross site request forgery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xss&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross site scripting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object relational mapping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the query contains an acronym, append the expansion tokens to&lt;br&gt;
the query. The index stays pure.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. The heading/filename/path boost stack
&lt;/h3&gt;

&lt;p&gt;Pure BM25 over docs returns weird results because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What is X" should rank a section literally titled "X" higher than
a section that mentions X 12 times in the body&lt;/li&gt;
&lt;li&gt;Files at &lt;code&gt;reference/foo.md&lt;/code&gt; are canonical definitions; tutorials
are examples&lt;/li&gt;
&lt;li&gt;Heading matches mean a lot more than body matches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the score gets four passes:&lt;br&gt;
raw_bm25_score(query)&lt;br&gt;
× HEADING_BOOST_FACTOR if heading-query overlap ≥ 50%&lt;/p&gt;

&lt;p&gt;1.0 if heading EXACTLY matches query subject&lt;br&gt;
× FILENAME_BOOST_FACTOR if filename overlaps query&lt;br&gt;
× REFERENCE_PATH_BOOST if path is under reference/&lt;/p&gt;

&lt;p&gt;And below a hard threshold, the result is rejected entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;CONFIDENCE_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_no_match&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last line is the difference between "honestly returns null"&lt;br&gt;
and "silently returns the least-bad section."&lt;/p&gt;
&lt;h2&gt;
  
  
  Day-1 update: typo tolerance
&lt;/h2&gt;

&lt;p&gt;A few hours after launching on Reddit, a commenter asked: "what&lt;br&gt;
about searching 'cross origin' for CORS, or what about typos like&lt;br&gt;
'rsponse_model'?"&lt;/p&gt;

&lt;p&gt;The first case worked fine — BM25 finds the CORS docs because the&lt;br&gt;
body contains "Cross-Origin Resource Sharing" verbatim. But typos?&lt;br&gt;
Total miss. "rsponse_model" returned a wrong answer at 0.34&lt;br&gt;
confidence — confidently wrong, above the threshold, no warning to&lt;br&gt;
the user.&lt;/p&gt;

&lt;p&gt;That's the worst possible failure mode for a "honest null" product:&lt;br&gt;
the no-fabrication promise breaks for typo'd in-corpus queries,&lt;br&gt;
which is arguably the more common failure mode than out-of-corpus&lt;br&gt;
queries.&lt;/p&gt;

&lt;p&gt;Fix shipped same day: a BK-tree (Burkhard-Keller tree) over the&lt;br&gt;
indexed vocabulary at index time, with query-time nearest-neighbour&lt;br&gt;
lookup using length-tuned edit distance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fuzzy_candidates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;max_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;   &lt;span class="c1"&gt;# short words: ambiguous beyond one edit
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;max_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;   &lt;span class="c1"&gt;# OAuth2PasswordBearer can tolerate more slop
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_dist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When fuzzy correction fires, the confidence is capped at 0.6 and the&lt;br&gt;
response includes a "verify the source" message so the caller knows&lt;br&gt;
the answer came from a corrected query, not an exact match.&lt;/p&gt;

&lt;p&gt;Plus a guard against fuzzy-correcting nonsense queries: if 3+ user&lt;br&gt;
tokens are unrecognized, return null. "Quantum chromodynamics&lt;br&gt;
neutrino flux" against FastAPI docs correctly stays null even though&lt;br&gt;
fuzzy lookup could find nearest-neighbour matches for each individual&lt;br&gt;
word.&lt;/p&gt;

&lt;h2&gt;
  
  
  What works well after the fixes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;what is response_model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;response_model Priority&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.0 confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;how do I add CORS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CORS (Cross-Origin Resource Sharing)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.0 confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;what is OAuth2PasswordBearer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;FastAPI's OAuth2PasswordBearer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1.0 confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;what is APIRouter&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;APIRouter class&lt;/code&gt; (in reference/apirouter.md)&lt;/td&gt;
&lt;td&gt;1.0 confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;what is rsponse_model&lt;/code&gt; (typo)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;response_model Priority&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.6 confidence + warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;how do I add corss&lt;/code&gt; (typo)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CORS preflight requests&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.46 confidence + warning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;what is quantum chromodynamics&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;honest refusal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What it doesn't do (being honest about scope)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No synthesis.&lt;/strong&gt; The &lt;code&gt;answer&lt;/code&gt; field is the matching section's body
verbatim, not a paraphrase. If you want a summary, use a different
tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No follow-up.&lt;/strong&gt; Each query is stateless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No multi-language.&lt;/strong&gt; English-only NLTK stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No conceptual cross-document linking.&lt;/strong&gt; It's keyword retrieval.
Two documents about "California" — one about OpenAI's office and
one about almond farming — won't be linked. For that you need
embeddings + entity profiles. This product is intentionally not
that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a Perplexity replacement.&lt;/strong&gt; If you ask open-domain questions
outside your corpus, you'll get &lt;code&gt;null&lt;/code&gt;. That's the feature.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When this is the right tool
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OSS docs Q&amp;amp;A&lt;/strong&gt; — your community can query your docs without you
paying per-question LLM costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal team wikis&lt;/strong&gt; that legally can't go to OpenAI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Air-gapped environments&lt;/strong&gt; (finance, healthcare, defense)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal knowledge management&lt;/strong&gt; — Obsidian or Logseq vault,
offline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI quality gate for docs&lt;/strong&gt; — fail a PR if it removes content
that used to be answerable (this is what the GitHub Action does)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Choice&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Web&lt;/td&gt;
&lt;td&gt;FastAPI + Uvicorn&lt;/td&gt;
&lt;td&gt;Async, typed, batteries-included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ranking&lt;/td&gt;
&lt;td&gt;rank-bm25&lt;/td&gt;
&lt;td&gt;Reference Okapi BM25 implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NLP&lt;/td&gt;
&lt;td&gt;NLTK&lt;/td&gt;
&lt;td&gt;WordNet, Penn Treebank tagger, stopwords — boring and reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fuzzy&lt;/td&gt;
&lt;td&gt;Custom BK-tree&lt;/td&gt;
&lt;td&gt;~150 lines, no dependency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parser&lt;/td&gt;
&lt;td&gt;markdown-it-py&lt;/td&gt;
&lt;td&gt;Handles fenced code blocks correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File watch&lt;/td&gt;
&lt;td&gt;watchdog&lt;/td&gt;
&lt;td&gt;Cross-platform file events&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Total app code: ~700 lines. Image size: ~250 MB. RAM at runtime:&lt;br&gt;
~40 MB. Indexes 1,800 markdown sections in well under a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/teamerisingstars/KB-API" rel="noopener noreferrer"&gt;github.com/teamerisingstars/KB-API&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Live demo: &lt;a href="https://kb-api-q30f.onrender.com" rel="noopener noreferrer"&gt;kb-api-q30f.onrender.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've built something similar or have thoughts on the BM25&lt;br&gt;
tuning, the fuzzy correction, or the boost stack, I'd genuinely like&lt;br&gt;
to hear what would change. Drop a comment or open an issue.&lt;/p&gt;




</description>
      <category>programming</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
