<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: jacobjerryarackal</title>
    <description>The latest articles on DEV Community by jacobjerryarackal (@jacobjerryarackal).</description>
    <link>https://dev.to/jacobjerryarackal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F528781%2F6e05759f-c5bc-477a-a6ef-46105d95f49e.png</url>
      <title>DEV Community: jacobjerryarackal</title>
      <link>https://dev.to/jacobjerryarackal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jacobjerryarackal"/>
    <language>en</language>
    <item>
      <title>I Built a RAG Pipeline. Then I Realized Retrieval Is the Real Model</title>
      <dc:creator>jacobjerryarackal</dc:creator>
      <pubDate>Wed, 08 Apr 2026 03:03:07 +0000</pubDate>
      <link>https://dev.to/jacobjerryarackal/i-built-a-rag-pipeline-then-i-realized-retrieval-is-the-real-model-4i7l</link>
      <guid>https://dev.to/jacobjerryarackal/i-built-a-rag-pipeline-then-i-realized-retrieval-is-the-real-model-4i7l</guid>
      <description>&lt;p&gt;Everyone talks about the LLM. GPT‑4, Claude, Gemini – that’s the celebrity. But after building my first real RAG pipeline, I learned something humbling: &lt;strong&gt;the LLM is the interchangeable part. The retrieval system is the actual worker.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me show you what I mean.&lt;/p&gt;

&lt;h3&gt;
  
  
  The 4‑Step Pipeline We All Copy
&lt;/h3&gt;

&lt;p&gt;You’ve seen the tutorial code a hundred times:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingest&lt;/strong&gt; – chunk your documents
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed&lt;/strong&gt; – turn chunks into vectors
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve&lt;/strong&gt; – find top‑k similar chunks
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt; – LLM answers with that context
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It works. My bot could answer company policy questions with citations. I felt smart.&lt;/p&gt;

&lt;p&gt;Then I asked: &lt;em&gt;“Can I get a refund for a digital product?”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The LLM gave a beautiful, confident answer which was completely wrong. Because my retrieval returned a chunk about &lt;em&gt;physical returns&lt;/em&gt; (30 days, original packaging) and completely missed the digital product exception sitting two paragraphs away.&lt;/p&gt;

&lt;p&gt;The LLM did its job perfectly. &lt;strong&gt;The retrieval failed.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Retrieval Is the Real Model
&lt;/h3&gt;

&lt;p&gt;Here’s what I learned the hard way:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What you think matters&lt;/th&gt;
&lt;th&gt;What actually matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Which LLM you use&lt;/td&gt;
&lt;td&gt;How you chunk documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt engineering&lt;/td&gt;
&lt;td&gt;Embedding quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompts&lt;/td&gt;
&lt;td&gt;Re‑ranking after retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The LLM just formats the answer. &lt;strong&gt;Retrieval decides whether the answer is true.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Code That Fixed My Pipeline
&lt;/h3&gt;

&lt;p&gt;Semantic search alone misses exact phrases like “non‑refundable after download”. Keyword search alone misses meaning. Hybrid search combines both. Here’s the core (using FAISS + BM25):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Load documents and embed
&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refund within 30 days, physical items only.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Digital products: non-refundable after download.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contact support for defective digital items.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;faiss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;IndexFlatL2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float32&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# 2. BM25 keyword index (tokenized)
&lt;/span&gt;&lt;span class="n"&gt;tokenized_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenized_docs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Hybrid search function
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hybrid_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Semantic score (distance -&amp;gt; similarity)
&lt;/span&gt;    &lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;distances&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;semantic_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;distances&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  

    &lt;span class="c1"&gt;# Keyword score
&lt;/span&gt;    &lt;span class="n"&gt;query_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;bm25_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;top_bm25_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;argsort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bm25_scores&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:][::&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;keyword_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;bm25_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_bm25_idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Combine (normalized)
&lt;/span&gt;    &lt;span class="n"&gt;combined&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;semantic_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_bm25_idx&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyword_scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;keyword_scores&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Test
&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can I get my money back for a digital product?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hybrid_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Output: Score: 0.92 | Digital products: non-refundable after download.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;alpha=0.5&lt;/code&gt; balances meaning and exact wording. Without hybrid search, the digital product chunk was #3 (ignored). With it, #1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three Changes That 10x’ed My Pipeline
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chunk size is not a default&lt;/strong&gt; – Moved to overlapping chunks (200 tokens with 50 overlap).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic search alone lies&lt;/strong&gt; – Added BM25 hybrid search (see code above).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re‑ranking changes everything&lt;/strong&gt; – A small cross‑encoder re‑scored top‑10 chunks, lifting accuracy from 72% to 91%.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Mistake Most People Make
&lt;/h3&gt;

&lt;p&gt;We treat RAG as an LLM problem. So we tweak prompts, swap models, add system instructions.&lt;/p&gt;

&lt;p&gt;But the LLM is &lt;em&gt;forced&lt;/em&gt; to use whatever context you give it. If you feed it the wrong chunk, it will hallucinate confidently. If you feed it the right chunk, even a small model answers correctly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottleneck is almost never the LLM. It’s the retriever.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Do Differently Now
&lt;/h3&gt;

&lt;p&gt;Before I write a single line of agent code, I ask three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;em&gt;“If I searched my vector database by hand, would I find the exact sentence that answers this?”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;“Does my retrieval work for synonyms AND exact keywords?”&lt;/em&gt; → if no, hybrid search.
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;“Is the top‑1 retrieved chunk actually the best?”&lt;/em&gt; → if no, add a re‑ranker.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;The AI industry sells you on the model. But in production RAG systems, the model is the cheapest, most replaceable component. The hard part – the part that separates working bots from demoware – is getting the right information into the context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The LLM is the pen. Retrieval is the memory. And memory is what makes a system useful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So next time your RAG bot fails, don’t blame GPT. Look at what you retrieved. I promise that’s where the real problem lives.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>rag</category>
    </item>
    <item>
      <title>We Let an LLM Control a File System and Run Commands – Here’s What Actually Broke First</title>
      <dc:creator>jacobjerryarackal</dc:creator>
      <pubDate>Sat, 04 Apr 2026 08:02:02 +0000</pubDate>
      <link>https://dev.to/jacobjerryarackal/we-let-an-llm-control-a-file-system-and-run-commands-heres-what-actually-broke-first-3618</link>
      <guid>https://dev.to/jacobjerryarackal/we-let-an-llm-control-a-file-system-and-run-commands-heres-what-actually-broke-first-3618</guid>
      <description>&lt;p&gt;I wanted to push an LLM beyond simple chat and see if it could actually build real code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbb0zkzfre85qdjpwjqe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbb0zkzfre85qdjpwjqe.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I gave it direct access to the file system and the ability to run terminal commands. The task was straightforward: “Create a clean React login page with email, password, remember-me checkbox, and form validation.”&lt;/p&gt;

&lt;p&gt;It started confidently. Within minutes everything broke.&lt;br&gt;
The System We Built&lt;br&gt;
We connected two tools to the LLM:&lt;/p&gt;

&lt;p&gt;file_system (list, read, write, delete files)&lt;br&gt;
run_command (execute npm, start dev server, etc.)&lt;/p&gt;

&lt;p&gt;We used MCP (the “USB-C for AI” protocol) so the model could call tools cleanly. The goal was to let the LLM act like a real developer — explore the folder, create files, install packages, and test the app.&lt;br&gt;
It sounded simple. It was not.&lt;/p&gt;

&lt;p&gt;Failure #1: It Assumed the Project Already Existed&lt;br&gt;
What broke:&lt;br&gt;
The model immediately started writing Login.jsx in an empty folder. No package.json, no React setup, no dependencies.&lt;br&gt;
Why it broke:&lt;br&gt;
The LLM had no understanding of project bootstrapping. It assumed a full React app was already there.&lt;br&gt;
What we learned:&lt;br&gt;
We had to explicitly tell it “first create the project structure” in every new session. This became our first mandatory step.&lt;/p&gt;

&lt;p&gt;Failure #2: It Ran Commands at the Wrong Time&lt;br&gt;
What broke:&lt;br&gt;
After creating a few files, it ran npm start and npm run build before any dependencies were installed. The terminal exploded with 47 errors.&lt;br&gt;
Why it broke:&lt;br&gt;
The model treated commands like a checklist instead of understanding dependencies. It didn’t realise you can’t run the app before npm install.&lt;br&gt;
What we learned:&lt;br&gt;
We added a rule: never run npm start or npm run build until package.json exists and all dependencies are installed. This single rule saved us from multiple crashes.&lt;/p&gt;

&lt;p&gt;Failure #3: It Mixed Concerns and Created Messy Code&lt;br&gt;
What broke:&lt;br&gt;
It put all the Tailwind CSS and form logic inside a single Login.jsx file. The component became 180 lines long, impossible to read, and had styling mixed with business logic.&lt;br&gt;
Why it broke:&lt;br&gt;
The model was optimising for “one file = done” instead of proper component structure.&lt;br&gt;
What we learned:&lt;br&gt;
We had to force it to create separate files (Login.jsx, Login.css, utils/validation.js). Once we added this constraint, the code quality jumped dramatically.&lt;/p&gt;

&lt;p&gt;Failure #4: It Had No Memory of Previous Mistakes&lt;br&gt;
What broke:&lt;br&gt;
Even after we fixed the directory issue, in the next loop it tried to create the same wrong file again in the wrong location.&lt;br&gt;
Why it broke:&lt;br&gt;
The model had no persistent memory of what it had already tried and failed.&lt;br&gt;
What we learned:&lt;br&gt;
We started saving a small agent-log.md file after every loop so the model could read its own history before making the next decision. This simple trick reduced repeated mistakes by almost 70%.&lt;br&gt;
After 8 loops and 14 minutes, we finally had a clean, working React login page with proper validation and structure.&lt;/p&gt;

&lt;p&gt;The Real Lesson&lt;br&gt;
The LLM wasn’t the problem. The problem was that we treated it like a magician instead of a junior developer with superpowers.&lt;br&gt;
Once we gave it real tools (file system + terminal) and forced it to work inside real constraints, it went from completely broken to actually useful.&lt;/p&gt;

&lt;p&gt;In 2026, the biggest unlock isn’t a smarter model.&lt;br&gt;
It’s giving the model the right tools and the right guardrails.&lt;br&gt;
I no longer ask LLMs to “write me some code.”&lt;br&gt;
I give them a file system, terminal access, and clear rules.&lt;br&gt;
That single change is what turns toys into tools you can actually ship.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>“Prompt Engineering Is Enough” Is Wrong – Here’s What I Had to Add</title>
      <dc:creator>jacobjerryarackal</dc:creator>
      <pubDate>Thu, 02 Apr 2026 12:10:36 +0000</pubDate>
      <link>https://dev.to/jacobjerryarackal/prompt-engineering-is-enough-is-wrong-heres-what-i-had-to-add-4057</link>
      <guid>https://dev.to/jacobjerryarackal/prompt-engineering-is-enough-is-wrong-heres-what-i-had-to-add-4057</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfyymk62ezcc73etxqee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfyymk62ezcc73etxqee.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used to believe the hype.&lt;br&gt;&lt;br&gt;
I thought if I just wrote better prompts clearer instructions, few‑shot examples, chain‑of‑thought so that I could make any LLM do whatever I wanted. I spent weeks refining prompts, tweaking wording, adding “think step by step” like it was magic.&lt;/p&gt;

&lt;p&gt;Then I tried to build something useful.&lt;/p&gt;

&lt;p&gt;I asked the model to check the weather and tell me if I needed an umbrella. The response was confident and completely wrong. It hallucinated the forecast based on its training cut‑off. No real data, just made‑up facts wrapped in perfect English.&lt;/p&gt;

&lt;p&gt;The prompt was excellent. The model was powerful. The result was useless.&lt;/p&gt;

&lt;p&gt;That’s when I realised the uncomfortable truth: &lt;strong&gt;prompt engineering is not enough.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I went back and added one thing that actually fixed it &lt;strong&gt;tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I gave the model a simple weather tool and forced it to use a basic OBSERVE → THINK → ACT loop. Nothing fancy. Just three steps every single time.&lt;/p&gt;

&lt;p&gt;Here’s what happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First loop – OBSERVE&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The model receives my request: “Check the weather in Kochi and tell me if I need an umbrella today.” It sees it has access to a tool called &lt;code&gt;get_weather(location)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second loop – THINK&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Instead of guessing, it reasons out loud: “I don’t have current weather data. I should use the tool to get real information.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third loop – ACT&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
It calls the tool in clean JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Kochi, Kerala"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tool returns actual data. The model goes back to THINK mode, sees the result (“light rain expected”), and only then generates the final answer: “Yes, take an umbrella.”&lt;/p&gt;

&lt;p&gt;No hallucination. No made‑up weather. Just one tool + one loop.&lt;/p&gt;

&lt;p&gt;I tested it on something harder, building a small React login page from scratch. With pure prompting, the model produced broken, outdated code and confidently told me it was correct. After adding &lt;code&gt;file_system&lt;/code&gt; and &lt;code&gt;run_command&lt;/code&gt; tools plus the same OBSERVE‑THINK‑ACT loop, it actually listed the directory, read the existing &lt;code&gt;package.json&lt;/code&gt;, wrote proper components, fixed its own bugs, and shipped working code across eight loops.&lt;/p&gt;

&lt;p&gt;The model was the same. The prompts were similar. The only difference was that I stopped treating the LLM as a magic oracle and started treating it as a brain that needs hands.&lt;/p&gt;

&lt;p&gt;I also added MCP, the simple protocol everyone now calls “USB‑C for AI.” It made plugging in new tools ridiculously easy. No custom glue code. Just declare the tool once, and the agent knows exactly how to call it.&lt;/p&gt;

&lt;p&gt;The change was night and day.&lt;/p&gt;

&lt;p&gt;I stopped wasting time writing longer and longer prompts. I started adding tools and a reliable loop instead. The results went from “impressively wrong” to “actually useful.”&lt;/p&gt;

&lt;p&gt;The lesson for 2026 is brutally simple: &lt;strong&gt;prompt engineering is table stakes, not the complete solution.&lt;/strong&gt; If your LLM keeps hallucinating, forgetting tasks, or failing at real work, stop tweaking the prompt. Give it proper tools and a structured loop to use them.&lt;/p&gt;

&lt;p&gt;The model is the brain. Tools are the hands. Without hands, even the smartest brain is stuck guessing.&lt;/p&gt;

&lt;p&gt;I no longer believe “prompt engineering is enough.”&lt;br&gt;&lt;br&gt;
I now know exactly what I have to add.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>architecture</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
