<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: RAGPrep</title>
    <description>The latest articles on DEV Community by RAGPrep (@ragprep).</description>
    <link>https://dev.to/ragprep</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869428%2F028191ba-4486-4bae-be60-438d82a51beb.png</url>
      <title>DEV Community: RAGPrep</title>
      <link>https://dev.to/ragprep</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ragprep"/>
    <language>en</language>
    <item>
      <title>80% of RAG Failures Start Here (And It's Not the LLM)</title>
      <dc:creator>RAGPrep</dc:creator>
      <pubDate>Thu, 09 Apr 2026 09:30:54 +0000</pubDate>
      <link>https://dev.to/ragprep/80-of-rag-failures-start-here-and-its-not-the-llm-11lp</link>
      <guid>https://dev.to/ragprep/80-of-rag-failures-start-here-and-its-not-the-llm-11lp</guid>
      <description>&lt;p&gt;A team spent three weeks debugging hallucinations in their RAG system. They tried different prompts. They swapped embedding models. They tuned retrieval parameters.&lt;/p&gt;

&lt;p&gt;The LLM wasn't the problem.&lt;br&gt;
The retriever wasn't the problem.&lt;br&gt;
The chunks were the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;Fixed-size chunking, 512 tokens, 10% overlap. Standard configuration. Nothing obviously wrong at first glance.&lt;/p&gt;

&lt;p&gt;But when we scored the chunks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;34% had completeness scores below 0.4&lt;/li&gt;
&lt;li&gt;28% were orphan chunks — fragments with no surrounding context&lt;/li&gt;
&lt;li&gt;19% duplicated information already in adjacent chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;12,000 embeddings in their vector database.&lt;br&gt;
4,000 of them were low quality.&lt;br&gt;
They were paying to store, retrieve, and feed garbage to their LLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  The specific failure
&lt;/h2&gt;

&lt;p&gt;A user asks: "What's the load capacity of the X400?"&lt;/p&gt;

&lt;p&gt;The retriever returns the 3 most semantically similar chunks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The X400 is designed for industrial use..."&lt;/li&gt;
&lt;li&gt;"Load capacity specifications vary by model..."&lt;/li&gt;
&lt;li&gt;"See table 4 for complete specifications..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Table 4 had been split across 3 chunks during ingestion, each missing the context to be useful. The LLM received three fragments that pointed to an answer without containing one. It hallucinated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;Most chunking strategies optimise for speed and simplicity, not quality. Fixed-size chunking splits documents at token boundaries with no awareness of semantic content. &lt;br&gt;
A sentence that starts in one chunk and ends in another produces two orphan fragments, each useless in isolation.&lt;/p&gt;

&lt;p&gt;The problem compounds at scale. In a demo with 50 hand-picked documents, you never hit these edge cases. &lt;br&gt;
In production with 50,000 documents from multiple sources, they're everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Score your chunks before you embed them.&lt;/p&gt;

&lt;p&gt;Specifically, check:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Completeness&lt;/strong&gt; — does the chunk contain a complete thought?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic density&lt;/strong&gt; — what ratio of the chunk is meaningful signal vs boilerplate?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context sufficiency&lt;/strong&gt; — could this chunk answer a question on its own?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Chunks that fail these checks should be merged, re-chunked, or filtered before they hit your vector database.&lt;/p&gt;

&lt;p&gt;A 2025 study on RAG systems found that optimising chunk quality improved faithfulness scores from 0.47 to 0.82 — a 74% improvement. The embedding model didn't change. &lt;br&gt;
The retriever didn't change. Only the chunk quality changed.&lt;/p&gt;

&lt;p&gt;The problem is almost always upstream of where you're looking.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I built &lt;a href="https://ragprep.com" rel="noopener noreferrer"&gt;ChunkScore&lt;/a&gt; to solve this problem — free chunk quality auditor, no signup required. &lt;br&gt;
Works on chunks from LangChain, LlamaIndex, Chonkie, or any JSON array.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>llm</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
