<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muaz</title>
    <description>The latest articles on DEV Community by Muaz (@muazashraf).</description>
    <link>https://dev.to/muazashraf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1067809%2F6afb56d1-68c0-421f-90a4-aab7f4e8d13a.jpg</url>
      <title>DEV Community: Muaz</title>
      <link>https://dev.to/muazashraf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/muazashraf"/>
    <language>en</language>
    <item>
      <title>5 Reasons Your RAG System Will Fail in Production (And the Patterns I Use to Fix Each One)</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Sun, 17 May 2026 19:15:36 +0000</pubDate>
      <link>https://dev.to/muazashraf/5-reasons-your-rag-system-will-fail-in-production-and-the-patterns-i-use-to-fix-each-one-34ac</link>
      <guid>https://dev.to/muazashraf/5-reasons-your-rag-system-will-fail-in-production-and-the-patterns-i-use-to-fix-each-one-34ac</guid>
      <description>&lt;h2&gt;
  
  
  The 80% Problem
&lt;/h2&gt;

&lt;p&gt;Most RAG demos look magical. You drop in 10 PDFs, ask 3 questions, get clean answers. Ship it.&lt;/p&gt;

&lt;p&gt;Then production hits. The document corpus grows from 10 to 10,000. Users ask questions the demo never anticipated. Edge cases stack up. Accuracy drops from 95% to 60% in two weeks. The team starts apologising to the client.&lt;/p&gt;

&lt;p&gt;I've built &lt;strong&gt;20+ production RAG systems&lt;/strong&gt; for clients across the USA, UK, UAE, Canada, Australia, Switzerland, and Pakistan. About 80% of the RAG projects I audit before clients hire me are in this exact failure mode — they passed the demo, then collapsed under real data.&lt;/p&gt;

&lt;p&gt;The fixes aren't more complex models. They're &lt;strong&gt;architectural patterns&lt;/strong&gt; designed for failure modes from day one. Here are the five that matter most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 1: Hallucinations on edge cases
&lt;/h2&gt;

&lt;p&gt;A vanilla RAG pipeline does this: embed the user query, retrieve top-k documents, stuff them into a prompt, ask the LLM to answer. When retrieval finds &lt;em&gt;something&lt;/em&gt;, the LLM dutifully constructs an answer — even when the retrieved context is unrelated to the question.&lt;/p&gt;

&lt;p&gt;In production, you get confident-sounding nonsense on the long tail of queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: a self-correction loop.&lt;/strong&gt; Before the LLM answers, force it to grade the retrieved context against the question. If the grade is poor, rewrite the query or fall back to a "I don't have enough information" response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;grade_relevance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Given the question and retrieved documents, score 0-10 how
    relevant the documents are to answering the question. Be strict.
    Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Documents: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
    Respond with just a number.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevance_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;route_after_grading&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relevance_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rewrite_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RAGState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieve&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grade&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grade_relevance&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rewrite_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rewrite_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generate_answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_conditional_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grade&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;route_after_grading&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built exactly this pattern for an enterprise client — full breakdown in my &lt;a href="https://muazashraf.vercel.app/case-studies/agentic-rag-chatbot-langchain-langgraph" rel="noopener noreferrer"&gt;Agentic RAG case study&lt;/a&gt;. It moved accuracy from ~70% to 90%+ on real questions, and dropped hallucinations to single digits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 2: Stale retrieval as your data changes
&lt;/h2&gt;

&lt;p&gt;You ship a RAG system on Monday with 500 documents. By Friday, 50 of those documents have been edited. Your vector store still has the old embeddings.&lt;/p&gt;

&lt;p&gt;Users ask questions about the new content. The system retrieves the old version. They lose trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: incremental re-indexing with content hashing, not full re-builds.&lt;/strong&gt; Hash each source document. On a schedule (or webhook), only re-embed documents whose hash changed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;document_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upsert_if_changed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;new_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;document_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;new_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;  &lt;span class="c1"&gt;# unchanged, skip
&lt;/span&gt;    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pinecone_index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;([{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;values&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;new_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;indexed_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="p"&gt;}])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This single pattern saved a client 70% on embedding API costs and kept their knowledge base accurate without manual intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 3: Bad retrieval ranking
&lt;/h2&gt;

&lt;p&gt;Top-k retrieval over pure semantic similarity has a known weakness: it rewards documents that &lt;em&gt;sound similar&lt;/em&gt; to the question, not documents that &lt;em&gt;answer&lt;/em&gt; the question. Worse, exact keyword matches (product codes, names, error codes) often get ranked below conceptually-similar-but-wrong chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: hybrid search + a reranker.&lt;/strong&gt; Combine dense vector search with sparse keyword search (BM25), then run the merged candidates through a cross-encoder reranker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rank_bm25&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BM25Okapi&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrossEncoder&lt;/span&gt;

&lt;span class="n"&gt;bm25&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BM25Okapi&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;reranker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrossEncoder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cross-encoder/ms-marco-MiniLM-L-6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hybrid_retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;dense_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sparse_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bm25&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_top_n&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dedupe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dense_hits&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;sparse_hits&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reranker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ranked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ranked&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why this matters: in financial, legal, and medical use cases, missing a specific code or term means missing the entire answer. Pure semantic search misses these constantly. Hybrid + rerank fixed this for a healthcare client managing 10,000+ patient records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 4: Multimodal blindspots
&lt;/h2&gt;

&lt;p&gt;Most RAG systems can't read the charts, diagrams, screenshots, or tables inside PDFs. They OCR the text and lose 40% of the information.&lt;/p&gt;

&lt;p&gt;If your domain has visual content — research papers, technical docs, medical scans, financial reports — text-only RAG is broken by design.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: vision-language embeddings (ColPali, CLIP) for image regions alongside text chunks.&lt;/strong&gt; Index both. Let the retriever match queries against both modalities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;colpali_engine.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ColPali&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ColPaliProcessor&lt;/span&gt;

&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ColPaliProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vidore/colpali&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ColPali&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vidore/colpali&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed_page_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdf_page_image&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pdf_page_image&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;last_hidden_state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Store both text embeddings AND image embeddings in the same vector store
# with a 'modality' tag. Retrieve from both, then merge.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built this for a research firm searching 10,000+ pages of mixed-content PDFs. Asking "show me the Q3 conversion funnel chart" actually returns the right chart now. Full writeup: &lt;a href="https://muazashraf.vercel.app/case-studies/multimodal-rag-colpali-clip" rel="noopener noreferrer"&gt;Multimodal RAG with ColPali &amp;amp; CLIP&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failure 5: No evaluation harness = no improvement
&lt;/h2&gt;

&lt;p&gt;Most teams ship RAG without an evaluation pipeline. Then when accuracy degrades, they can't tell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did retrieval get worse?&lt;/li&gt;
&lt;li&gt;Did the LLM get worse?&lt;/li&gt;
&lt;li&gt;Did the data get harder?&lt;/li&gt;
&lt;li&gt;Was it always this bad and we just didn't notice?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can't fix what you can't measure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix: a golden dataset + automated nightly eval.&lt;/strong&gt; 50–100 hand-curated question/answer pairs covering your edge cases. Run them through the system every deploy. Track three metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_rag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_hit_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# did retrieval find the right doc?
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer_correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# did the final answer match?
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# was the answer grounded in retrieved docs?
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_doc_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_answer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;retrieved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;retrieval_hit_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expected_doc_ids&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer_correctness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;llm_judge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expected_answer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;llm_judge_grounding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;golden_dataset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the single highest-leverage thing you can build. Every RAG improvement I've shipped started with one of these metrics moving in the wrong direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Design for failure on day 1
&lt;/h2&gt;

&lt;p&gt;If I had to compress all 20 RAG projects into one sentence: &lt;strong&gt;the production-ready systems are the ones designed for failure from the first commit.&lt;/strong&gt; Self-correction loops, hash-based incremental indexing, hybrid retrieval, multimodal embeddings, and an evaluation harness aren't optimizations you add later — they're load-bearing infrastructure.&lt;/p&gt;

&lt;p&gt;Most "AI demos that broke in production" stories are really "demos without failure handling that met production." The fix isn't a smarter model. It's better architecture.&lt;/p&gt;

&lt;p&gt;If you're building a RAG system that needs to survive real data, look at every component and ask: &lt;em&gt;what happens when this fails?&lt;/em&gt; If you don't have an answer, that's the next thing to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;I'm &lt;strong&gt;&lt;a href="https://muazashraf.vercel.app" rel="noopener noreferrer"&gt;Muaz Ashraf&lt;/a&gt;&lt;/strong&gt;, a freelance AI engineer specialising in production-ready RAG systems, AI agents, and AI integration. I've shipped 20+ AI systems across 7 countries with a 100% project completion rate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔗 Portfolio: &lt;a href="https://muazashraf.vercel.app/portfolio" rel="noopener noreferrer"&gt;muazashraf.vercel.app/portfolio&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 Case studies: &lt;a href="https://muazashraf.vercel.app/case-studies" rel="noopener noreferrer"&gt;muazashraf.vercel.app/case-studies&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;✉️ Hire me: &lt;a href="https://muazashraf.vercel.app/contact" rel="noopener noreferrer"&gt;muazashraf.vercel.app/contact&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open for AI consulting, RAG system development, AI agent development, and LLM application work. Typical MVP delivery: 2–4 weeks.&lt;/p&gt;

&lt;p&gt;If you found this useful, follow me here on dev.to — I publish field notes from real production AI work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I tested Claude Code, Gemini CLI, and OpenAI Codex for 3 months – here's the verdict</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Thu, 04 Sep 2025 18:15:53 +0000</pubDate>
      <link>https://dev.to/muazashraf/i-tested-claude-code-gemini-cli-and-openai-codex-for-3-months-heres-the-verdict-g3n</link>
      <guid>https://dev.to/muazashraf/i-tested-claude-code-gemini-cli-and-openai-codex-for-3-months-heres-the-verdict-g3n</guid>
      <description>&lt;h2&gt;
  
  
  Model Intelligence &amp;amp; Code Quality
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code CLI&lt;/strong&gt; ⭐⭐⭐⭐⭐ - Powered by Claude Sonnet 4 (1M context) - Consistently produces the highest quality, most maintainable code - Exceptional at capturing coding style and project conventions - Best-in-class for complex architectural decisions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Uses Gemini 2.5 Pro with massive 1M token context - 63.8% on SWE-bench Verified (trailing Claude Code) - Strong multimodal capabilities - Excellent at handling large codebases due to context size&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Codex CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - GPT-5 achieves 74.9% on SWE-bench Verified - Highly accurate, idiomatically correct code on first try - Strong at rapid prototyping and bug fixing - Good reasoning capabilities with o4-mini integration&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Features &amp;amp; Integration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code CLI&lt;/strong&gt; ⭐⭐⭐⭐⭐ - Complete GitHub/GitLab integration - Multi-file editing capabilities - Agentic codebase search and understanding - CLI-first design works with any editor - Advanced hooks system for workflow customization - Real-time progress tracking with TodoWrite&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Built-in Google Search grounding - MCP (Model Context Protocol) support - File operations and shell commands - ReAct loop for complex task completion - Strong containerized environment support&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Codex CLI&lt;/strong&gt; ⭐⭐⭐⭐☆ - Multimodal inputs (text, screenshots, diagrams) - Local sandboxed execution - Multiple approval modes (suggest, auto-edit, full-auto) - Comprehensive testing integration - Built-in security with command review&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-World Performance Comparison&lt;/strong&gt;&lt;br&gt;
Scenario 1: Large Codebase Refactoring&lt;br&gt;
Winner: Claude Code CLI - Superior architectural understanding - Better handling of cross-file dependencies - Most maintainable code output&lt;/p&gt;

&lt;p&gt;Scenario 2: Quick Bug Fixing&lt;br&gt;
Winner: OpenAI Codex CLI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fastest time to resolution - Highly accurate first-try fixes - Excellent at understanding error contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scenario 3: Learning New Framework&lt;br&gt;
Winner: Gemini CLI - Free tier allows extensive experimentation - Excellent documentation search capabilities - Large context window for comprehensive examples&lt;/p&gt;

&lt;p&gt;Scenario 4: Team Collaboration&lt;br&gt;
Winner: Claude Code CLI - Best integration with existing workflows - Superior code style consistency - Professional-grade reliability&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://muazashraf.vercel.app/blog/10-claude-code-vs-gemini-cli-vs-codex" rel="noopener noreferrer"&gt;MuazAshraf&lt;/a&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>gemini</category>
      <category>programming</category>
      <category>ai</category>
    </item>
    <item>
      <title>Advanced RAG vs Basic RAG: When simple retrieval isn't enough (LangChain + LangGraph implementation)</title>
      <dc:creator>Muaz</dc:creator>
      <pubDate>Thu, 04 Sep 2025 18:07:13 +0000</pubDate>
      <link>https://dev.to/muazashraf/advanced-rag-vs-basic-rag-when-simple-retrieval-isnt-enough-langchain-langgraph-implementation-5bem</link>
      <guid>https://dev.to/muazashraf/advanced-rag-vs-basic-rag-when-simple-retrieval-isnt-enough-langchain-langgraph-implementation-5bem</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) has changed how AI systems work with information. But while basic RAG gets the job done, advanced RAG takes it to the next level. In this post, I'll show you the difference between basic and advanced RAG, and how modern tools like LangChain and LangGraph make building smart AI systems much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  I've been working with RAG systems and noticed basic retrieval fails for complex queries. Here's what I learned about advanced techniques:
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problems with Basic RAG&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can't handle multi-step reasoning&lt;/li&gt;
&lt;li&gt;Poor context understanding&lt;/li&gt;
&lt;li&gt;No query refinement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advanced RAG Solutions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-correcting retrieval loops&lt;/li&gt;
&lt;li&gt;Multi-agent reasoning with LangGraph&lt;/li&gt;
&lt;li&gt;Contextual re-ranking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why I choose Langchain + Langgraph?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I tried my own custom logic but the code will become much complex and difficult to manage&lt;/li&gt;
&lt;li&gt;Langchain provide built in libraries, its effective and easy to manageable.&lt;/li&gt;
&lt;li&gt;Now you can use this advance Rag in any sector like in Education, Finance, Healthcare, you name it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Has anyone else run into these limitations? Would love to hear your experiences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." alt="Uploading image" width="800" height="400"&gt;&lt;/a&gt;&lt;br&gt;
Full technical breakdown: &lt;a href="https://muazashraf.vercel.app/blog/11-advanced-rag-techniques" rel="noopener noreferrer"&gt;Advance RAG&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>langchain</category>
      <category>langgraph</category>
    </item>
  </channel>
</rss>
