<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yash Bhoskar</title>
    <description>The latest articles on DEV Community by Yash Bhoskar (@yashbhoskar).</description>
    <link>https://dev.to/yashbhoskar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4002653%2F73c329e8-43b3-4a52-b6e6-f5cd232c6583.jpg</url>
      <title>DEV Community: Yash Bhoskar</title>
      <link>https://dev.to/yashbhoskar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yashbhoskar"/>
    <language>en</language>
    <item>
      <title>Agentic Chunking - Why Your RAG Pipeline Is Quietly Failing (And How to Fix It)</title>
      <dc:creator>Yash Bhoskar</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:21:48 +0000</pubDate>
      <link>https://dev.to/yashbhoskar/agentic-chunking-why-your-rag-pipeline-is-quietly-failing-and-how-to-fix-it-1c61</link>
      <guid>https://dev.to/yashbhoskar/agentic-chunking-why-your-rag-pipeline-is-quietly-failing-and-how-to-fix-it-1c61</guid>
      <description>&lt;h2&gt;
  
  
  The Real Reason RAG Fails
&lt;/h2&gt;

&lt;p&gt;You embedded your docs. You picked a great model. Answers are still shallow and wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The culprit isn't your model. It's your chunks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Split ideas in the wrong place and your retriever returns broken context. Your model hallucinates to fill the gaps. Traditional chunking optimizes for speed — agentic chunking optimizes for &lt;em&gt;understanding&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Chunking Showdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;How It Works&lt;/th&gt;
&lt;th&gt;Strengths&lt;/th&gt;
&lt;th&gt;Weaknesses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fixed-Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split every N tokens&lt;/td&gt;
&lt;td&gt;Fast, cheap&lt;/td&gt;
&lt;td&gt;Cuts mid-idea, mixes unrelated concepts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split at similarity boundaries&lt;/td&gt;
&lt;td&gt;More natural breaks&lt;/td&gt;
&lt;td&gt;Still static, misses long-range links&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Proposition-Based&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extract atomic facts first&lt;/td&gt;
&lt;td&gt;High granularity, factually precise&lt;/td&gt;
&lt;td&gt;Can feel fragmented without smart grouping&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agentic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLM decides chunk membership + evolves metadata&lt;/td&gt;
&lt;td&gt;Meaning-first, dynamic, coherent&lt;/td&gt;
&lt;td&gt;More LLM calls, higher indexing cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What Makes Agentic Chunking Different
&lt;/h2&gt;

&lt;p&gt;It behaves like a good editor, not a pair of scissors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generalizes across vocabulary&lt;/strong&gt; — &lt;span&gt;apples&lt;/span&gt;, &lt;span&gt;pizza&lt;/span&gt;, and &lt;span&gt;sushi&lt;/span&gt; all become &lt;code&gt;food_preferences&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evolves chunk metadata dynamically&lt;/strong&gt; — titles and summaries refresh as new content is added, improving retrieval ranking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handles real-world mess&lt;/strong&gt; — blogs, research notes, and docs that repeat or shift topics don't break it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Core Loop
&lt;/h2&gt;

&lt;p&gt;Built on the &lt;strong&gt;Dense X Retrieval&lt;/strong&gt; paper (&lt;a href="https://arxiv.org/pdf/2312.06648" rel="noopener noreferrer"&gt;arXiv:2312.06648&lt;/a&gt;) which proved propositions outperform sentences and passages as retrieval units:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ffb3wkfskxhgufpzaqhar.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ffb3wkfskxhgufpzaqhar.png" alt="Dense X Retrieval Figure" width="799" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h6&gt;
  
  
  &lt;a href="https://gemini.google/au/overview/image-generation/?hl=en-AU" rel="noopener noreferrer"&gt;generated by nano 🍌&lt;/a&gt;
&lt;/h6&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;proposition&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;extract_propositions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunk_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_relevant_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proposition&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_outline&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;add_to_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proposition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;refresh_metadata&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;create_new_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proposition&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For reliable structured output from the LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChunkID&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Want to try this in LangChain? The community prompt is live here:&lt;br&gt;
🔗 &lt;a href="https://smith.langchain.com/hub/kumja/proposal-indexing" rel="noopener noreferrer"&gt;kumja/proposal-indexing on LangSmith Hub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🌽 The Corn Test
&lt;/h2&gt;

&lt;p&gt;Your knowledge base contains three corn-related facts: fresh corn, corn tortillas, and high-fructose corn syrup.&lt;/p&gt;

&lt;p&gt;A fixed-size chunker mashes them together. A query about &lt;em&gt;healthy snacks&lt;/em&gt; retrieves the corn syrup fact too — poisoning your context.&lt;/p&gt;

&lt;p&gt;An agentic chunker separates them into &lt;code&gt;fresh_produce&lt;/code&gt;, &lt;code&gt;traditional_cuisine&lt;/code&gt;, and &lt;code&gt;food_additives&lt;/code&gt;. The right chunk, the right query, the right answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your chunking can handle corn, it can handle production data.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Honest Tradeoff
&lt;/h2&gt;

&lt;p&gt;Need ultra-low cost and latency? Semantic chunking is fine.&lt;/p&gt;

&lt;p&gt;Need retrieval quality you can trust? Agentic chunking is worth the extra LLM calls — especially for long-form docs, overlapping topics, or any pipeline where a wrong answer has real consequences.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is agentic chunking the same as proposition-based chunking?&lt;/strong&gt;&lt;br&gt;
No — propositions are the raw material. Agentic chunking is the assembly process that groups them intelligently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it work with any embedding model?&lt;/strong&gt;&lt;br&gt;
Yes. It's a pre-processing step. Embed with whatever you prefer after chunking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best model to use as the agent?&lt;/strong&gt;&lt;br&gt;
Smaller models (Haiku, GPT-4o-mini) handle chunk assignment well. Use larger models for generating high-quality titles and summaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Good for real-time ingestion?&lt;/strong&gt;&lt;br&gt;
Best for batch indexing. For real-time, run semantic chunking fast and agentic chunking async in the background.&lt;/p&gt;




&lt;p&gt;📄 &lt;strong&gt;Research:&lt;/strong&gt; Chen et al. (2023). &lt;em&gt;Dense X Retrieval.&lt;/em&gt; &lt;a href="https://arxiv.org/pdf/2312.06648" rel="noopener noreferrer"&gt;arXiv:2312.06648&lt;/a&gt;&lt;br&gt;
🔗 &lt;strong&gt;LangSmith Prompt:&lt;/strong&gt; &lt;a href="https://smith.langchain.com/hub/kumja/proposal-indexing" rel="noopener noreferrer"&gt;kumja/proposal-indexing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agentaichallenge</category>
      <category>rag</category>
      <category>agents</category>
    </item>
    <item>
      <title>Docling - AI-Powered Document Pipeline for LLMs &amp; RAG</title>
      <dc:creator>Yash Bhoskar</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:18:20 +0000</pubDate>
      <link>https://dev.to/yashbhoskar/docling-ai-powered-document-pipeline-for-llms-rag-53m4</link>
      <guid>https://dev.to/yashbhoskar/docling-ai-powered-document-pipeline-for-llms-rag-53m4</guid>
      <description>&lt;p&gt;&lt;em&gt;If you've ever tried feeding a PDF into an LLM and wondered why the output was garbage — the problem wasn't your model. It was your parser.&lt;br&gt;
&lt;a href="https://www.docling.ai/" rel="noopener noreferrer"&gt;Docling&lt;/a&gt; is an open-source document AI pipeline by &lt;a href="https://research.ibm.com/" rel="noopener noreferrer"&gt;IBM Research&lt;/a&gt; that goes far beyond text extraction. Unlike traditional tools like &lt;code&gt;pypdf&lt;/code&gt; or &lt;code&gt;pdfplumber&lt;/code&gt;, Docling uses deep learning to understand document structure — reconstructing tables, fixing reading order, and producing clean, LLM-ready output. Whether you're building a RAG system, processing financial reports, or ingesting research papers, Docling is the document intelligence layer your pipeline is missing.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It doesn’t just extract content — it reconstructs the &lt;em&gt;meaningful layout&lt;/em&gt; of a document.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi8lux07tbfgibxgszvne.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi8lux07tbfgibxgszvne.png" alt="Docling document AI pipeline showing PDF, DOCX, and image inputs being parsed into structured LLM-ready output" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Docling Beats Traditional Parsers
&lt;/h2&gt;

&lt;p&gt;Let’s be honest — traditional libraries were never built for AI workflows.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Traditional Parsers&lt;/th&gt;
&lt;th&gt;Docling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text Extraction&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layout Understanding&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Table Reconstruction&lt;/td&gt;
&lt;td&gt;❌ (messy text)&lt;/td&gt;
&lt;td&gt;✅ (structured grid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-format Support&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Extensive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reading Order&lt;/td&gt;
&lt;td&gt;Broken in columns&lt;/td&gt;
&lt;td&gt;Correct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunking for LLMs&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata Awareness&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Real Problem with Traditional Tools
&lt;/h3&gt;

&lt;p&gt;Traditional tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extract text based on &lt;strong&gt;positions&lt;/strong&gt;, not meaning
&lt;/li&gt;
&lt;li&gt;Break tables into unreadable blobs
&lt;/li&gt;
&lt;li&gt;Completely mess up &lt;strong&gt;multi-column layouts&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Lose context like headings, sections, and hierarchy
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Result: Garbage input → Poor LLM output&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Docling’s Edge
&lt;/h3&gt;

&lt;p&gt;Docling flips the game:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;deep learning models&lt;/strong&gt; (not heuristics)&lt;/li&gt;
&lt;li&gt;Understands &lt;strong&gt;document structure like a human&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Outputs &lt;strong&gt;clean, structured, LLM-ready data&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not parsing — this is &lt;strong&gt;document intelligence&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Format Support (One Pipeline to Rule Them All)
&lt;/h2&gt;

&lt;p&gt;Docling isn’t just for PDFs.&lt;/p&gt;

&lt;p&gt;It seamlessly handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF
&lt;/li&gt;
&lt;li&gt;Word (.docx)
&lt;/li&gt;
&lt;li&gt;PowerPoint (.pptx)
&lt;/li&gt;
&lt;li&gt;Excel (.xlsx)
&lt;/li&gt;
&lt;li&gt;HTML / Markdown
&lt;/li&gt;
&lt;li&gt;Images (PNG, JPEG, TIFF)
&lt;/li&gt;
&lt;li&gt;AsciiDoc
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;You can run a &lt;strong&gt;single pipeline across mixed document types&lt;/strong&gt; — something traditional tools simply can’t do.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Parsing Phase — Where Docling Truly Shines
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Layout Understanding (DocLayNet)
&lt;/h2&gt;

&lt;p&gt;Docling uses &lt;strong&gt;DocLayNet&lt;/strong&gt;, a trained model that identifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Headings
&lt;/li&gt;
&lt;li&gt;Paragraphs
&lt;/li&gt;
&lt;li&gt;Tables
&lt;/li&gt;
&lt;li&gt;Figures
&lt;/li&gt;
&lt;li&gt;Captions
&lt;/li&gt;
&lt;li&gt;Footnotes
&lt;/li&gt;
&lt;li&gt;Lists
&lt;/li&gt;
&lt;li&gt;Code blocks
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;It doesn’t just &lt;em&gt;see&lt;/em&gt; text — it understands what that text &lt;em&gt;is&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi0bnzxfsk8omheq5qcnq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fi0bnzxfsk8omheq5qcnq.png" alt="DocLayNet layout detection model identifying headings, tables, paragraphs, and figures in a document with bounding boxes" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Table Parsing (TableFormer)
&lt;/h2&gt;

&lt;p&gt;Traditional tools butcher tables.&lt;/p&gt;

&lt;p&gt;Docling uses &lt;strong&gt;TableFormer&lt;/strong&gt; to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reconstruct full table grids
&lt;/li&gt;
&lt;li&gt;Handle merged cells
&lt;/li&gt;
&lt;li&gt;Understand multi-line headers
&lt;/li&gt;
&lt;li&gt;Preserve row/column relationships
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Output = Clean, structured data (not scrambled text)&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Figure &amp;amp; Chart Detection
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Extracts figures as images
&lt;/li&gt;
&lt;li&gt;Links them with captions
&lt;/li&gt;
&lt;li&gt;Maintains document context
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ Note: It does &lt;em&gt;not&lt;/em&gt; interpret chart data — only isolates it cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔍 OCR (But Done Right)
&lt;/h2&gt;

&lt;p&gt;For scanned documents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses EasyOCR / Tesseract
&lt;/li&gt;
&lt;li&gt;Maintains &lt;strong&gt;layout-aware reading order&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;No more left-to-right OCR chaos.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Reading Order Recovery
&lt;/h2&gt;

&lt;p&gt;This is a silent killer in PDFs.&lt;/p&gt;

&lt;p&gt;Docling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixes multi-column reading
&lt;/li&gt;
&lt;li&gt;Reconstructs logical flow
&lt;/li&gt;
&lt;li&gt;Makes documents actually readable for LLMs
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Chunking — Built for RAG (This is Gold)
&lt;/h2&gt;

&lt;p&gt;If you're building RAG systems, this is where Docling becomes &lt;em&gt;insane value&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hierarchical Chunking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Respects structure (heading → section → paragraph)
&lt;/li&gt;
&lt;li&gt;No random splits mid-sentence
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hybrid Chunking
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Combines:

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://yashblog-jade.vercel.app/posts/deep-dive-into-semantic-chunking-for-rag" rel="noopener noreferrer"&gt;Semantic&lt;/a&gt; structure
&lt;/li&gt;
&lt;li&gt;Token limits
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Perfect chunks for LLM context windows&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Context Preservation
&lt;/h3&gt;

&lt;p&gt;Each chunk carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page number
&lt;/li&gt;
&lt;li&gt;Bounding box
&lt;/li&gt;
&lt;li&gt;Section hierarchy
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Retrieval becomes &lt;strong&gt;accurate + explainable&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Tables &amp;amp; Figures Stay Intact
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Tables are never split
&lt;/li&gt;
&lt;li&gt;Figures remain atomic
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;No more broken context in retrieval&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ftbb59ft1waul1snked5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ftbb59ft1waul1snked5h.png" alt="Docling semantic chunking pipeline breaking structured document sections into metadata-tagged chunks for vector database ingestion" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  DoclingDocument — The Secret Sauce
&lt;/h2&gt;

&lt;p&gt;Instead of raw text, Docling outputs a:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;DoclingDocument&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A structured representation of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Entire document hierarchy
&lt;/li&gt;
&lt;li&gt;Layout elements
&lt;/li&gt;
&lt;li&gt;Metadata
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can export it as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Markdown
&lt;/li&gt;
&lt;li&gt;JSON
&lt;/li&gt;
&lt;li&gt;HTML
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the pipeline &lt;strong&gt;fully composable&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Plug-and-Play with LLM Ecosystems
&lt;/h2&gt;

&lt;p&gt;Docling integrates with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.llamaindex.ai/" rel="noopener noreferrer"&gt;LlamaIndex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.langchain.com/" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; datasets
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop it straight into your RAG pipeline as the ingestion layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ What Docling Isn’t Perfect At
&lt;/h2&gt;

&lt;p&gt;Let’s keep it real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ No chart-to-data interpretation
&lt;/li&gt;
&lt;li&gt;🐢 Slow for very large documents (200+ pages)
&lt;/li&gt;
&lt;li&gt;⚖️ Overkill for simple text PDFs
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When Should You Use Docling?
&lt;/h2&gt;

&lt;p&gt;Use Docling when working with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📄 Research papers
&lt;/li&gt;
&lt;li&gt;📊 Financial reports
&lt;/li&gt;
&lt;li&gt;📘 Technical documentation
&lt;/li&gt;
&lt;li&gt;📜 Contracts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Basically — &lt;strong&gt;anything with structure&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 When NOT to Use It
&lt;/h2&gt;

&lt;p&gt;Skip Docling if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You just need plain text extraction
&lt;/li&gt;
&lt;li&gt;Your documents are extremely simple
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;In those cases, lighter tools are faster.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Bonus: Notebook for Hands-On Usage
&lt;/h2&gt;

&lt;p&gt;A full notebook is attached where you can explore Docling in action and integrate it efficiently into your pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Docling isn’t just another parser — it’s a &lt;strong&gt;foundation layer for Document AI systems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If traditional tools are:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Extract text and hope for the best”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Docling is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Understand the document, preserve its meaning, and make it LLM-ready”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🧠 My Take
&lt;/h2&gt;

&lt;p&gt;As LLM applications grow, &lt;strong&gt;input quality matters more than model size&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Docling solves the &lt;em&gt;real bottleneck&lt;/em&gt;:&lt;br&gt;
👉 Turning messy documents into structured, meaningful data&lt;/p&gt;

&lt;p&gt;And that’s exactly why it stands out.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>rag</category>
      <category>ibm</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>Deep Dive into Semantic Chunking for RAG</title>
      <dc:creator>Yash Bhoskar</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:12:33 +0000</pubDate>
      <link>https://dev.to/yashbhoskar/deep-dive-into-semantic-chunking-for-rag-2cnn</link>
      <guid>https://dev.to/yashbhoskar/deep-dive-into-semantic-chunking-for-rag-2cnn</guid>
      <description>&lt;p&gt;In the previous article, &lt;a href="https://blog.yashbhoskar.online/posts/different-chunking-methods-for-rag" rel="noopener noreferrer"&gt;Different Chunking Methods for RAG&lt;/a&gt;, we explored several strategies used to split documents before feeding them into a Retrieval-Augmented Generation (RAG) pipeline.&lt;/p&gt;

&lt;p&gt;In this chapter, we’ll go deeper into &lt;strong&gt;Semantic Chunking&lt;/strong&gt; — one of the most powerful techniques for improving retrieval accuracy in modern RAG systems.&lt;/p&gt;

&lt;p&gt;We’ll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What semantic chunking actually means ?&lt;/li&gt;
&lt;li&gt;How it works internally ?&lt;/li&gt;
&lt;li&gt;Why it improves retrieval accuracy ?&lt;/li&gt;
&lt;li&gt;How it compares to other chunking strategies used in production systems ?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Traditional Chunking Often Fails
&lt;/h2&gt;

&lt;p&gt;Most early RAG pipelines relied on &lt;strong&gt;fixed-size chunking&lt;/strong&gt;, where documents are split into chunks of predefined size (for example, 500 tokens with a 50 token overlap).&lt;/p&gt;

&lt;p&gt;While this approach is simple, it introduces a fundamental problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;it ignores the semantic structure of the text.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For example, imagine a paragraph discussing &lt;strong&gt;transformer architectures&lt;/strong&gt;, followed by another paragraph explaining &lt;strong&gt;reinforcement learning&lt;/strong&gt;. A fixed-size splitter might cut the text in the middle of the explanation, creating chunks that contain &lt;strong&gt;partial or mixed topics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This leads to two common issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context fragmentation&lt;/strong&gt; – important ideas get split across chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noisy retrieval&lt;/strong&gt; – chunks contain unrelated information.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When these chunks are retrieved during query time, the LLM receives incomplete or irrelevant context, which directly reduces answer quality.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Semantic Chunking?
&lt;/h2&gt;

&lt;p&gt;Semantic chunking is a strategy that splits documents based on &lt;strong&gt;meaning rather than size&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of arbitrarily cutting text every few hundred tokens, semantic chunking groups sentences that &lt;strong&gt;discuss the same topic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The goal is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Each chunk should represent a &lt;strong&gt;coherent semantic idea&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, consider the following sequence of sentences:&lt;/p&gt;

&lt;p&gt;Sentence 1: Explanation of transformers&lt;br&gt;
Sentence 2: Attention mechanism in transformers&lt;br&gt;
Sentence 3: Multi-head attention architecture&lt;br&gt;
Sentence 4: Reinforcement learning algorithms&lt;/p&gt;

&lt;p&gt;A semantic chunker would produce:&lt;/p&gt;

&lt;p&gt;Chunk 1 → Sentences 1–3 (transformer topic)&lt;br&gt;
Chunk 2 → Sentence 4 (new topic)&lt;/p&gt;

&lt;p&gt;This ensures that each chunk represents a &lt;strong&gt;complete concept&lt;/strong&gt;, which significantly improves retrieval relevance.&lt;/p&gt;


&lt;h2&gt;
  
  
  How Semantic Chunking Works
&lt;/h2&gt;

&lt;p&gt;Most semantic chunking implementations follow a similar pipeline.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1 — Sentence Segmentation
&lt;/h3&gt;

&lt;p&gt;The document is first split into sentences.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document → Sentence1, Sentence2, Sentence3, Sentence4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows the algorithm to analyze semantic similarity at a granular level.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2 — Generate Sentence Embeddings
&lt;/h3&gt;

&lt;p&gt;Each sentence is converted into a vector representation using an embedding model.&lt;/p&gt;

&lt;p&gt;Common embedding models include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sentence Transformers&lt;/li&gt;
&lt;li&gt;BGE embeddings&lt;/li&gt;
&lt;li&gt;Instructor embeddings&lt;/li&gt;
&lt;li&gt;OpenAI embeddings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each sentence is now represented as a &lt;strong&gt;high-dimensional vector capturing its meaning&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 3 — Compute Similarity Between Sentences
&lt;/h3&gt;

&lt;p&gt;Next, the algorithm calculates &lt;strong&gt;cosine similarity&lt;/strong&gt; between consecutive sentences.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;S1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;S2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;S2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;S4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;High similarity indicates the sentences belong to the &lt;strong&gt;same topic&lt;/strong&gt;, while low similarity suggests a topic shift.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4 — Detect Topic Boundaries
&lt;/h3&gt;

&lt;p&gt;If the similarity between sentences drops below a predefined threshold, a &lt;strong&gt;new chunk boundary&lt;/strong&gt; is created.&lt;/p&gt;

&lt;p&gt;Example rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;similarity &amp;gt; 0.75 → same chunk
similarity &amp;lt; 0.65 → start new chunk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dynamically segments the document based on semantic transitions.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 5 — Build Semantic Chunks
&lt;/h3&gt;

&lt;p&gt;Finally, sentences are grouped into chunks that maintain &lt;strong&gt;topic continuity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Unlike fixed chunking, semantic chunks may vary in size, but they maintain &lt;strong&gt;contextual coherence&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frh12mah7y0d1ia9u5vtx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frh12mah7y0d1ia9u5vtx.png" alt="High-level pipeline showing how documents are segmented, embedded, and grouped into semantic chunks before being stored in a vector database for RAG retrieval." width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Semantic Chunking Improves RAG Performance
&lt;/h2&gt;

&lt;p&gt;Semantic chunking improves RAG pipelines in several important ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Better Context Integrity
&lt;/h3&gt;

&lt;p&gt;Each chunk contains a &lt;strong&gt;complete explanation of a concept&lt;/strong&gt;, which helps the LLM reason more effectively.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Higher Retrieval Precision
&lt;/h3&gt;

&lt;p&gt;Vector similarity search works best when chunks represent &lt;strong&gt;clear semantic topics&lt;/strong&gt; rather than mixed content.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Reduced Hallucination
&lt;/h3&gt;

&lt;p&gt;When retrieved context is precise and coherent, the LLM is less likely to generate unsupported information.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Improved Answer Grounding
&lt;/h3&gt;

&lt;p&gt;Because chunks are semantically aligned, answers are better supported by retrieved documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Accuracy Comparison with Other Chunking Methods
&lt;/h2&gt;

&lt;p&gt;Across many internal and industry experiments, semantic chunking tends to outperform traditional chunking approaches.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chunking Method&lt;/th&gt;
&lt;th&gt;Retrieval Precision&lt;/th&gt;
&lt;th&gt;Context Quality&lt;/th&gt;
&lt;th&gt;Implementation Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed Token Chunking&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recursive Chunking&lt;/td&gt;
&lt;td&gt;Medium–High&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Chunking&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Advanced&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In many RAG systems, teams report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;15–30% improvement in retrieval relevance&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;More grounded responses&lt;/li&gt;
&lt;li&gt;Lower hallucination rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These improvements become especially noticeable in &lt;strong&gt;long-form documents&lt;/strong&gt; like research papers, legal documents, or technical documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Challenges
&lt;/h2&gt;

&lt;p&gt;Despite its advantages, semantic chunking is not always trivial to implement.&lt;/p&gt;

&lt;p&gt;Some practical challenges include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Higher compute cost&lt;/strong&gt;&lt;br&gt;
Generating embeddings for every sentence can be expensive for large document sets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Threshold tuning&lt;/strong&gt;&lt;br&gt;
The similarity threshold must be tuned carefully to avoid overly small or overly large chunks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variable chunk sizes&lt;/strong&gt;&lt;br&gt;
Chunks can become uneven, which sometimes requires adding a maximum token limit.&lt;/p&gt;


&lt;h2&gt;
  
  
  Production Best Practices
&lt;/h2&gt;

&lt;p&gt;In most production RAG systems, semantic chunking is combined with &lt;strong&gt;token limits and overlap strategies&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A common configuration looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Semantic similarity threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.75&lt;/span&gt;
&lt;span class="na"&gt;Max chunk size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;800 tokens&lt;/span&gt;
&lt;span class="na"&gt;Overlap&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;50 tokens&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures chunks remain &lt;strong&gt;semantically meaningful while staying within model limits&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;Semantic chunking is a powerful technique, but it’s just one piece of the puzzle. In the next chapter, we’ll explore &lt;strong&gt;Agentic Chunking&lt;/strong&gt; — a dynamic approach where the LLM itself decides how to group information based on meaning and relevance, evolving chunk metadata over time.&lt;/p&gt;

&lt;p&gt;Follow along as we discuss &lt;code&gt;Agentic Chunking&lt;/code&gt; in our next chapter.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>productivity</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Different Chunking Methods for RAG</title>
      <dc:creator>Yash Bhoskar</dc:creator>
      <pubDate>Thu, 25 Jun 2026 16:04:39 +0000</pubDate>
      <link>https://dev.to/yashbhoskar/different-chunking-methods-for-rag-j4g</link>
      <guid>https://dev.to/yashbhoskar/different-chunking-methods-for-rag-j4g</guid>
      <description>&lt;h2&gt;
  
  
  The Ultimate Guide to Chunking Methods for RAG
&lt;/h2&gt;




&lt;h2&gt;
  
  
  What is Chunking?
&lt;/h2&gt;

&lt;p&gt;To stay within a Large Language Model's (LLM) token limit, we employ &lt;strong&gt;chunking&lt;/strong&gt;—a preprocessing technique that breaks down continuous text into discrete blocks. This allows the model to process information efficiently without exceeding its memory constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is RAG?
&lt;/h2&gt;

&lt;p&gt;LLMs often suffer from hallucinations, generating false information with unearned confidence. This lack of factual "grounding" makes them unreliable for many high-stakes tasks.&lt;/p&gt;

&lt;p&gt;To solve this, &lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; was introduced to provide LLMs with a "source of truth" to consult before answering.&lt;/p&gt;

&lt;p&gt;To make RAG work, we first turn our documents into "digital fingerprints" called &lt;strong&gt;vector embeddings&lt;/strong&gt;. We use specialized AI models (bi-encoders) to translate human text into these numbers, which are then stored in a vector database.&lt;/p&gt;

&lt;p&gt;Think of it like a high-tech library: the quality of the search depends entirely on how we’ve filed the information. If our "chunks" of text are too big or too small, the AI won't find the right answer. That’s why choosing a smart chunking strategy is just as important as the search method itself for getting accurate results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Different Chunking Methods
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fixed-Size Chunking
&lt;/h3&gt;

&lt;p&gt;This is the most straightforward "brute force" approach where you decide on a set number of characters or tokens (e.g., 500 characters) and split the text exactly at those intervals.&lt;/p&gt;

&lt;p&gt;While it is incredibly fast and computationally cheap, it is "blind" to the content. It often cuts sentences in half or separates a heading from its relevant paragraph, which can lead to a loss of context during retrieval. &lt;em&gt;This is the "old reliable" method—it just counts characters and cuts.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your long document text here...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;separator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;  &lt;span class="c1"&gt;# Overlap helps keep context between chunks
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Recursive Chunking
&lt;/h3&gt;

&lt;p&gt;Considered the "industry standard" for many applications, this method attempts to be more polite to the structure of the text. It uses a hierarchy of separators—starting with double newlines, then single newlines, then spaces—to break the text.&lt;/p&gt;

&lt;p&gt;If a paragraph is too big, it looks for the next best place to split it, aiming to keep related sentences together in a single block as much as possible. &lt;em&gt;This is the recommended default.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;separators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_documents&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Document-Specific Chunking
&lt;/h3&gt;

&lt;p&gt;This method acknowledges that a Python script, an HTML page, and a Markdown file are structured differently. Instead of treating everything like a plain wall of text, it uses the document’s inherent formatting (like &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; tags, &lt;code&gt;#&lt;/code&gt; headers, or function definitions) to determine the boundaries.&lt;/p&gt;

&lt;p&gt;This ensures that a single function or a specific sub-section of a manual stays intact as a coherent unit. &lt;em&gt;This is best for structured data like Markdown, HTML, or Code.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MarkdownHeaderTextSplitter&lt;/span&gt;

&lt;span class="n"&gt;headers_to_split_on&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Header 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;##&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Header 2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MarkdownHeaderTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;headers_to_split_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers_to_split_on&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;markdown_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Semantic Chunking
&lt;/h3&gt;

&lt;p&gt;Rather than looking at characters or formatting, this method looks at meaning. It analyzes the "distance" in ideas between sentences; as long as the sentences are talking about the same topic, they stay in the same chunk.&lt;/p&gt;

&lt;p&gt;When the model detects a significant shift in the subject matter, it creates a break. This results in chunks that vary in size but are incredibly consistent in their topical focus. &lt;em&gt;This requires an embedding model to "read" the sentences and decide if they belong together.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_experimental.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SemanticChunker&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_openai.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIEmbeddings&lt;/span&gt;

&lt;span class="c1"&gt;# It groups sentences by how similar they are in meaning
&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticChunker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAIEmbeddings&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_documents&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Agentic Chunking
&lt;/h3&gt;

&lt;p&gt;This is the most advanced and "human-like" strategy, where an LLM acts as an autonomous agent to decide where the breaks should go. The agent reads the document and asks, &lt;em&gt;"Does this part stand alone as a complete thought?"&lt;/em&gt; It essentially "edits" the document into logical pieces based on high-level reasoning.&lt;/p&gt;

&lt;p&gt;While this is the most accurate and context-aware method, it is also the slowest and most expensive because it requires multiple AI calls just to prepare the data. &lt;em&gt;This is usually a custom "loop" where you ask an LLM to look at a chunk and decide if it's "complete" or needs more text.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Conceptual Pseudo-code (usually implemented via LangGraph or custom loops)
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agentic_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Start with a small piece of text
&lt;/span&gt;    &lt;span class="c1"&gt;# 2. Ask LLM: "Is this a complete thought?"
&lt;/span&gt;    &lt;span class="c1"&gt;# 3. If NO: Add next sentence and repeat.
&lt;/span&gt;    &lt;span class="c1"&gt;# 4. If YES: Create chunk and move to the next part.
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Which One Should You Use?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Best For...&lt;/th&gt;
&lt;th&gt;Difficulty&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fixed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quick prototypes&lt;/td&gt;
&lt;td&gt;Very Easy&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recursive&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General text / Articles&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code / Formatted docs&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep research / RAG&lt;/td&gt;
&lt;td&gt;Hard&lt;/td&gt;
&lt;td&gt;Low (Embedding API calls)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agentic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High-precision needs&lt;/td&gt;
&lt;td&gt;Very Hard&lt;/td&gt;
&lt;td&gt;High (LLM API calls)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;That’s the high-level view of how we break down data for LLMs! But knowing the definitions is only half the battle. In the coming weeks, I’ll be breaking down each of these strategies in detail—sharing the code, the common pitfalls, and the "Goldilocks" settings for your chunk sizes.&lt;/p&gt;

&lt;p&gt;Follow along as we &lt;a href="https://blog.yashbhoskar.online/posts/deep-dive-into-semantic-chunking-for-rag" rel="noopener noreferrer"&gt;Deep Dive into Semantic Chunking For Rag&lt;/a&gt; in our next chapter.&lt;/p&gt;




</description>
      <category>rag</category>
      <category>ai</category>
      <category>webdev</category>
      <category>llm</category>
    </item>
    <item>
      <title>RAG Is Not Just Chunking Embedding Retrieval Generation</title>
      <dc:creator>Yash Bhoskar</dc:creator>
      <pubDate>Thu, 25 Jun 2026 15:52:21 +0000</pubDate>
      <link>https://dev.to/yashbhoskar/rag-is-not-just-chunking-embedding-retrieval-generation-34n6</link>
      <guid>https://dev.to/yashbhoskar/rag-is-not-just-chunking-embedding-retrieval-generation-34n6</guid>
      <description>&lt;p&gt;If I had a dollar $ for every time someone explained RAG in exactly four boxes and an arrow between each, I'd have enough to fine-tune a small LLM by now.&lt;/p&gt;

&lt;p&gt;Here's the thing — those four boxes aren't &lt;strong&gt;&lt;em&gt;wrong&lt;/em&gt;&lt;/strong&gt;. They're just the skeleton. And a skeleton without organs, blood flow, and a nervous system doesn't walk anywhere. It just lies there looking like it should work.&lt;/p&gt;

&lt;p&gt;So before you nod along to the "it's simple" version, sit with these for a second:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did your parser actually capture the &lt;strong&gt;table&lt;/strong&gt; on page 14, or did it turn into word soup?&lt;/li&gt;
&lt;li&gt;That chart your document had — does your pipeline even know it existed?&lt;/li&gt;
&lt;li&gt;Why &lt;strong&gt;that&lt;/strong&gt; chunk size? Why &lt;strong&gt;that&lt;/strong&gt; overlap? Did you pick it, or did a tutorial pick it for you?&lt;/li&gt;
&lt;li&gt;Your vector DB choice — was that a real decision, or the first result on Google?&lt;/li&gt;
&lt;li&gt;The 5 chunks you retrieved — are they relevant, or just &lt;em&gt;similar-sounding&lt;/em&gt;?&lt;/li&gt;
&lt;li&gt;Is there noise riding along with the signal, diluting your answer?&lt;/li&gt;
&lt;li&gt;How do you know the LLM's answer is actually &lt;strong&gt;grounded&lt;/strong&gt; in what you retrieved, and not just... plausible?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's not pedantry. That's the entire difference between a RAG demo that wows your manager once and a RAG system that survives contact with real users and real documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Flow (Bird's-Eye View)
&lt;/h2&gt;

&lt;p&gt;Think of it less like a pipe and more like a &lt;strong&gt;relay race with judges at every handoff&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;What's &lt;em&gt;actually&lt;/em&gt; happening&lt;/th&gt;
&lt;th&gt;The question nobody asks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parsing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Documents → clean structured text&lt;/td&gt;
&lt;td&gt;Did tables/images survive, or vanish?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chunking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Splitting text into digestible pieces&lt;/td&gt;
&lt;td&gt;Why this size? Why this overlap?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Turning chunks into vectors&lt;/td&gt;
&lt;td&gt;Does this model "get" your domain?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vectors land in a DB&lt;/td&gt;
&lt;td&gt;Picked for hype, or for your scale/latency needs?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keyword (BM25) + semantic search&lt;/td&gt;
&lt;td&gt;Are you only doing vector search and missing exact matches?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata Filtering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Narrowing by source/date/dept&lt;/td&gt;
&lt;td&gt;Or is everything just dumped into one giant pile?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cross-encoder re-scores top candidates&lt;/td&gt;
&lt;td&gt;Or are you trusting raw similarity scores blindly?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Selection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Picking the final Top-K chunks&lt;/td&gt;
&lt;td&gt;Too few = missing info. Too many = confused LLM.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Generation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLM writes the answer&lt;/td&gt;
&lt;td&gt;Grounded in your docs, or politely hallucinating?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Answer Relevancy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Did it actually answer the question&lt;/td&gt;
&lt;td&gt;Anyone checking, or just shipping it?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;p&gt;Every single row above has its own failure modes, its own trade-offs, and honestly — its own rabbit hole worth a blog post of its own.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjak2dkkg66d5mi1hfvy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fjak2dkkg66d5mi1hfvy5.png" alt="infographic illustrating the complete 10-stage RAG pipeline. It displays a clean, linear sequence of minimal icons from parsing to answer relevancy, accompanied by punchy accent-text annotations." width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Actually Matters
&lt;/h2&gt;

&lt;p&gt;A "simple" RAG pipeline fails silently. It doesn't crash — it just gives you a confidently wrong answer, citing a chunk that's 70% irrelevant, built from a table your parser butchered, retrieved because it was &lt;em&gt;vector-similar&lt;/em&gt; rather than &lt;em&gt;actually-useful&lt;/em&gt;. And nobody notices until a user does.&lt;/p&gt;

&lt;p&gt;Good RAG isn't about stacking the four boxes. It's about making &lt;strong&gt;every junction in that relay race accountable&lt;/strong&gt; — parsing accountable for fidelity, chunking accountable for context, retrieval accountable for relevance, generation accountable for grounding.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;This was the 30,000-ft view — intentionally not deep, just enough to make you go "oh, there's &lt;em&gt;way&lt;/em&gt; more going on here." Up next, I'll deep-dive each stage one by one, starting with the most underrated villain of every RAG pipeline: &lt;strong&gt;document parsing&lt;/strong&gt; (yes, before you even think about chunking).&lt;/p&gt;

&lt;p&gt;Stay tuned. 🧠&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Inspired by my own hurdles 🙂&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
