Retrieval-Augmented Generation (RAG) has become the default architecture for building AI-powered document intelligence systems. Most implementations follow the same pattern:
- Split documents into chunks
- Convert chunks into embeddings
- Store them in a vector database
- Retrieve the most similar chunks
- Send them to an LLM to generate answers
This pipeline works reasonably well for simple text. However, when applied to structured documents like clinical records, chunking can introduce serious problems.
Healthcare documents are rich with context and hierarchy. Breaking them into arbitrary chunks often leads to context loss, retrieval errors, and fragmented reasoning.
In this article, you will understand why chunking fails using a realistic clinical document example, and how structure-aware indexing and summarization can produce far better results.
Note - This post focuses on the Healthcare Domain with the patient clinical document as an example.
The Clinical Document Example
Consider the following clinical summary sample:
Patient Name: Jordan M.
DOB: 06/21/1990
Date of Summary: 08/01/2025
Diagnosis: F33.1 Major Depressive Disorder, recurrent, moderate
Symptoms: Persistent low mood, disrupted sleep, concentration issues
Treatment Summary:
- 12 CBT sessions, weekly
- Focused on core beliefs, behavioral activation
- PHQ-9 improved from 17 to 6
Medications: Sertraline 50mg daily, no side effects reported
Follow-Up Plan:
- Referral to psychiatrist for medication continuation
- Recommended ongoing biweekly therapy
At first glance, this document appears small, but clinical records in real systems often span hundreds of pages across multiple visits.
Even in this simple example, the document contains clear semantic sections:
Patient Info
Diagnosis
Symptoms
Treatment Summary
Medications
Follow-Up Plan
These sections provide the structure necessary for proper interpretation.
What Happens When We Chunk This Document
A traditional RAG system might split the text into chunks like this:
Chunk A
Patient Name: Jordan M.
DOB: 06/21/1990
Diagnosis: Major Depressive Disorder
Symptoms: Persistent low mood
Chunk B
Treatment Summary:
12 CBT sessions
PHQ-9 improved from 17 to 6
Chunk C
Medications: Sertraline 50mg daily
Follow-Up Plan: referral to psychiatrist
1. Cross-Section Reasoning Questions
These require information from multiple chunks, which chunk-based retrieval often fails to assemble.
Example Questions
• What treatment improved the patient’s PHQ-9 score?
• What medication is being used to treat the patient's depression?
• What treatment approach was used along with medication?
• What interventions helped reduce the patient’s depression score?
Why Chunking Fails
The system may retrieve:
Chunk B
PHQ-9 improved from 17 to 6
But it does not contain medication information, so the answer becomes incomplete.
2. Contextual Medical Questions
These questions require understanding relationships between sections.
Example Questions
• What condition is the patient being treated for with Sertraline?
• Why was the patient referred to a psychiatrist?
• What symptoms led to the treatment plan?
Why Chunking Fails
Chunk C contains medication, but diagnosis is in Chunk A, so the model may not connect them.
3. Treatment Outcome Questions
These require linking treatment with outcomes.
Example Questions
• Did the therapy sessions improve the patient’s condition?
• What evidence shows the patient improved during treatment?
• How effective was the treatment plan?
Why Chunking Fails
The improvement metric:
PHQ-9 improved from 17 to 6
appears in Chunk B, but the context about depression diagnosis is in Chunk A.
4. Follow-Up Care Questions
These require understanding treatment history and next steps.
Example Questions
• Why does the patient need psychiatric follow-up?
• What follow-up care is recommended after treatment?
• What ongoing care is suggested for this patient?
Why Chunking Fails
Chunk C contains the follow-up plan but not the context of the diagnosis or therapy outcome.
5. Comprehensive Clinical Summary Questions
These require multiple chunks simultaneously.
Example Questions
• Summarize the patient’s diagnosis, treatment, and follow-up plan.
• What treatments has the patient received for depression?
• What is the overall care plan for this patient?
Why Chunking Fails
Chunk-based retrieval may only return one chunk, causing a partial summary.
Example incomplete retrieval:
Chunk B
Treatment Summary
12 CBT sessions
PHQ-9 improved from 17 to 6
But the system misses medication and follow-up care.
6. Ambiguous Retrieval Questions
These expose semantic similarity issues in vector search.
Example Questions
• What therapy is the patient receiving?
• What treatment is the patient undergoing?
• How is the patient being treated?
Vector search may retrieve:
Chunk B
Treatment Summary
But it misses medication in Chunk C, which is also part of the treatment plan.
Vector similarity measures semantic proximity, not clinical context.
The result: incorrect or incomplete answers.
Why Chunking Breaks Clinical Documents
Healthcare documents illustrate several fundamental problems with chunking.
1. Clinical Context Gets Fragmented
Clinical notes often rely on relationships between sections.
Example:
Diagnosis - Explains why treatment was prescribed
Treatment - Explains how symptoms improved
Follow-Up - Explains ongoing care
When chunked, these relationships disappear.
2. Important Meaning Spans Sections
Consider the treatment outcome:
PHQ-9 improved from 17 to 6
This metric only makes sense if the model also understands:
Diagnosis: Major Depressive Disorder
Treatment: CBT sessions
Medication: Sertraline
Chunking separates these connected ideas.
3. Clinical Reasoning Requires Structure
Doctors interpret records by navigating sections:
Diagnosis
Symptoms
Treatment
Medication
Follow-Up
Chunking ignores this hierarchy entirely.
A Better Approach: Structure-Aware Document Retrieval
Instead of splitting documents arbitrarily, the document structure can be preserved by producing a tree based hierarchical structure.
Example hierarchical representation:
Clinical Summary
├ Patient Information
│ ├ Name
│ ├ DOB
│
├ Diagnosis
│
├ Symptoms
│
├ Treatment Summary
│
├ Medications
│
└ Follow-Up Plan
Each section becomes a retrieval node.
This structure preserves the clinical context.
Adding Summarization for Better Retrieval
To improve retrieval efficiency, each section can be summarized.
Example summaries:
Patient Information
Summary: Patient demographics including name and DOB.
Diagnosis
Summary: Major Depressive Disorder (recurrent, moderate).
Treatment Summary
Summary: 12 CBT sessions with significant improvement in PHQ-9 score.
Medications
Summary: Sertraline 50mg daily with no reported side effects.
Follow-Up Plan
Summary: Referral to psychiatrist and continued biweekly therapy.
These summaries act as compressed semantic representations of the document.
How Retrieval Works with Summaries
User query:
"What medication is the patient currently taking?"
The system compares the query to section summaries:
Diagnosis - Mental health condition
Treatment - Therapy sessions
Medications - Drug prescription
Follow-Up - Future care
The correct section (Medications) is retrieved immediately.
Example Final Context
Retrieved section:
Medications:
Sertraline 50mg daily, no side effects reported
Generated response:
The patient is currently prescribed Sertraline 50mg daily, with no reported side effects.
High-level Architecture for Clinical RAG
A structure-aware system might follow this pipeline:
This preserves meaning while reducing noise.
Why This Matters in Healthcare AI
Clinical AI systems must prioritize:
• Accuracy
• Traceability
• Context awareness
Chunk-based retrieval often struggles to meet these requirements.
Structure-aware approaches provide:
Higher precision
Relevant sections are retrieved instead of unrelated chunks.
Better explainability
The system can show exact sections used in reasoning.
Improved clinical safety
Maintaining document hierarchy reduces the risk of misinterpretation.
The Future of RAG in Healthcare
As AI becomes more integrated into healthcare systems, document understanding will play a critical role.
The next generation of RAG architectures will likely include:
• Hierarchical document indexing
• Section-level summarization
• Reasoning-based retrieval
• Agentic document exploration
These approaches allow AI systems to navigate clinical documents more like human experts.
Conclusion
The chunking assumes documents are bags of paragraphs. But documents are actually structured knowledge systems. Even when documents appear unstructured, the structure can be inferred. And once structure exists, retrieval becomes far more accurate.
Structured documents like clinical records, it often causes more problems than it solves.
If you need the AI systems to truly understand documents, in such cases preserving the structure and allow models to reason over meaningful sections is really crucial.
Moving beyond chunking is a critical step toward building safer, more reliable document intelligence systems.
In the next blog posts, you will be walked with a realistic example on how to deal with the unstructured data and its retrieval.
Attribution
Clinical document sample was referenced from https://www.supanote.ai/templates/clinical-summary-template
This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.

Top comments (0)