DEV Community

Cover image for Why Chunking Is the Biggest Mistake in RAG Systems
Ranjan Dailata
Ranjan Dailata

Posted on

Why Chunking Is the Biggest Mistake in RAG Systems

Retrieval-Augmented Generation (RAG) has become the default architecture for building AI-powered document intelligence systems. Most implementations follow the same pattern:

  1. Split documents into chunks
  2. Convert chunks into embeddings
  3. Store them in a vector database
  4. Retrieve the most similar chunks
  5. Send them to an LLM to generate answers

This pipeline works reasonably well for simple text. However, when applied to structured documents like clinical records, chunking can introduce serious problems.

Healthcare documents are rich with context and hierarchy. Breaking them into arbitrary chunks often leads to context loss, retrieval errors, and fragmented reasoning.

In this article, you will understand why chunking fails using a realistic clinical document example, and how structure-aware indexing and summarization can produce far better results.

Note - This post focuses on the Healthcare Domain with the patient clinical document as an example.


The Clinical Document Example

Consider the following clinical summary sample:

Patient Name: Jordan M.
DOB: 06/21/1990
Date of Summary: 08/01/2025

Diagnosis: F33.1 Major Depressive Disorder, recurrent, moderate
Symptoms: Persistent low mood, disrupted sleep, concentration issues

Treatment Summary:
- 12 CBT sessions, weekly
- Focused on core beliefs, behavioral activation
- PHQ-9 improved from 17 to 6

Medications: Sertraline 50mg daily, no side effects reported

Follow-Up Plan:
- Referral to psychiatrist for medication continuation
- Recommended ongoing biweekly therapy
Enter fullscreen mode Exit fullscreen mode

At first glance, this document appears small, but clinical records in real systems often span hundreds of pages across multiple visits.

Even in this simple example, the document contains clear semantic sections:

Patient Info
Diagnosis
Symptoms
Treatment Summary
Medications
Follow-Up Plan
Enter fullscreen mode Exit fullscreen mode

These sections provide the structure necessary for proper interpretation.


What Happens When We Chunk This Document

A traditional RAG system might split the text into chunks like this:

Chunk A
Patient Name: Jordan M.
DOB: 06/21/1990
Diagnosis: Major Depressive Disorder
Symptoms: Persistent low mood
Enter fullscreen mode Exit fullscreen mode
Chunk B
Treatment Summary:
12 CBT sessions
PHQ-9 improved from 17 to 6
Enter fullscreen mode Exit fullscreen mode
Chunk C
Medications: Sertraline 50mg daily
Follow-Up Plan: referral to psychiatrist
Enter fullscreen mode Exit fullscreen mode

1. Cross-Section Reasoning Questions

These require information from multiple chunks, which chunk-based retrieval often fails to assemble.

Example Questions

• What treatment improved the patient’s PHQ-9 score?
• What medication is being used to treat the patient's depression?
• What treatment approach was used along with medication?
• What interventions helped reduce the patient’s depression score?

Why Chunking Fails

The system may retrieve:

Chunk B
PHQ-9 improved from 17 to 6
Enter fullscreen mode Exit fullscreen mode

But it does not contain medication information, so the answer becomes incomplete.


2. Contextual Medical Questions

These questions require understanding relationships between sections.

Example Questions

• What condition is the patient being treated for with Sertraline?
• Why was the patient referred to a psychiatrist?
• What symptoms led to the treatment plan?

Why Chunking Fails

Chunk C contains medication, but diagnosis is in Chunk A, so the model may not connect them.


3. Treatment Outcome Questions

These require linking treatment with outcomes.

Example Questions

• Did the therapy sessions improve the patient’s condition?
• What evidence shows the patient improved during treatment?
• How effective was the treatment plan?

Why Chunking Fails

The improvement metric:

PHQ-9 improved from 17 to 6
Enter fullscreen mode Exit fullscreen mode

appears in Chunk B, but the context about depression diagnosis is in Chunk A.


4. Follow-Up Care Questions

These require understanding treatment history and next steps.

Example Questions

• Why does the patient need psychiatric follow-up?
• What follow-up care is recommended after treatment?
• What ongoing care is suggested for this patient?

Why Chunking Fails

Chunk C contains the follow-up plan but not the context of the diagnosis or therapy outcome.


5. Comprehensive Clinical Summary Questions

These require multiple chunks simultaneously.

Example Questions

• Summarize the patient’s diagnosis, treatment, and follow-up plan.
• What treatments has the patient received for depression?
• What is the overall care plan for this patient?

Why Chunking Fails

Chunk-based retrieval may only return one chunk, causing a partial summary.

Example incomplete retrieval:

Chunk B
Treatment Summary
12 CBT sessions
PHQ-9 improved from 17 to 6
Enter fullscreen mode Exit fullscreen mode

But the system misses medication and follow-up care.


6. Ambiguous Retrieval Questions

These expose semantic similarity issues in vector search.

Example Questions

• What therapy is the patient receiving?
• What treatment is the patient undergoing?
• How is the patient being treated?

Vector search may retrieve:

Chunk B
Treatment Summary
Enter fullscreen mode Exit fullscreen mode

But it misses medication in Chunk C, which is also part of the treatment plan.

Vector similarity measures semantic proximity, not clinical context.

The result: incorrect or incomplete answers.


Why Chunking Breaks Clinical Documents

Healthcare documents illustrate several fundamental problems with chunking.


1. Clinical Context Gets Fragmented

Clinical notes often rely on relationships between sections.

Example:

Diagnosis - Explains why treatment was prescribed
Treatment - Explains how symptoms improved
Follow-Up - Explains ongoing care
Enter fullscreen mode Exit fullscreen mode

When chunked, these relationships disappear.


2. Important Meaning Spans Sections

Consider the treatment outcome:

PHQ-9 improved from 17 to 6
Enter fullscreen mode Exit fullscreen mode

This metric only makes sense if the model also understands:

Diagnosis: Major Depressive Disorder
Treatment: CBT sessions
Medication: Sertraline
Enter fullscreen mode Exit fullscreen mode

Chunking separates these connected ideas.


3. Clinical Reasoning Requires Structure

Doctors interpret records by navigating sections:

Diagnosis
Symptoms
Treatment
Medication
Follow-Up
Enter fullscreen mode Exit fullscreen mode

Chunking ignores this hierarchy entirely.


A Better Approach: Structure-Aware Document Retrieval

Instead of splitting documents arbitrarily, the document structure can be preserved by producing a tree based hierarchical structure.

Example hierarchical representation:

Clinical Summary
 ├ Patient Information
 │   ├ Name
 │   ├ DOB
 │
 ├ Diagnosis
 │
 ├ Symptoms
 │
 ├ Treatment Summary
 │
 ├ Medications
 │
 └ Follow-Up Plan
Enter fullscreen mode Exit fullscreen mode

Each section becomes a retrieval node.

This structure preserves the clinical context.


Adding Summarization for Better Retrieval

To improve retrieval efficiency, each section can be summarized.

Example summaries:

Patient Information
Summary: Patient demographics including name and DOB.

Diagnosis
Summary: Major Depressive Disorder (recurrent, moderate).

Treatment Summary
Summary: 12 CBT sessions with significant improvement in PHQ-9 score.

Medications
Summary: Sertraline 50mg daily with no reported side effects.

Follow-Up Plan
Summary: Referral to psychiatrist and continued biweekly therapy.
Enter fullscreen mode Exit fullscreen mode

These summaries act as compressed semantic representations of the document.


How Retrieval Works with Summaries

User query:

"What medication is the patient currently taking?"

The system compares the query to section summaries:

Diagnosis - Mental health condition
Treatment - Therapy sessions
Medications - Drug prescription
Follow-Up - Future care
Enter fullscreen mode Exit fullscreen mode

The correct section (Medications) is retrieved immediately.


Example Final Context

Retrieved section:

Medications:
Sertraline 50mg daily, no side effects reported
Enter fullscreen mode Exit fullscreen mode

Generated response:

The patient is currently prescribed Sertraline 50mg daily, with no reported side effects.


High-level Architecture for Clinical RAG

A structure-aware system might follow this pipeline:

High-level Architecture for Clinical RAG

This preserves meaning while reducing noise.


Why This Matters in Healthcare AI

Clinical AI systems must prioritize:

• Accuracy
• Traceability
• Context awareness

Chunk-based retrieval often struggles to meet these requirements.

Structure-aware approaches provide:

Higher precision

Relevant sections are retrieved instead of unrelated chunks.

Better explainability

The system can show exact sections used in reasoning.

Improved clinical safety

Maintaining document hierarchy reduces the risk of misinterpretation.


The Future of RAG in Healthcare

As AI becomes more integrated into healthcare systems, document understanding will play a critical role.

The next generation of RAG architectures will likely include:

• Hierarchical document indexing
• Section-level summarization
• Reasoning-based retrieval
• Agentic document exploration

These approaches allow AI systems to navigate clinical documents more like human experts.


Conclusion

The chunking assumes documents are bags of paragraphs. But documents are actually structured knowledge systems. Even when documents appear unstructured, the structure can be inferred. And once structure exists, retrieval becomes far more accurate.

Structured documents like clinical records, it often causes more problems than it solves.

If you need the AI systems to truly understand documents, in such cases preserving the structure and allow models to reason over meaningful sections is really crucial.

Moving beyond chunking is a critical step toward building safer, more reliable document intelligence systems.

In the next blog posts, you will be walked with a realistic example on how to deal with the unstructured data and its retrieval.


Attribution

Clinical document sample was referenced from https://www.supanote.ai/templates/clinical-summary-template

This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.

Top comments (0)