If your text chunks are too small, the AI misses the context. If they are too big, the search becomes "blurry" and inaccurate. To solve this, advanced developers use Small-to-Big Retrieval. Two popular flavors are Sentence Window and Parent Document Retrieval.
Here is the breakdown of how they work and which one you should choose.
π€ The Shared Secret: "Search Small, Read Big"
Both techniques follow one rule: Search using a tiny, precise snippet, but give the LLM a large, context-rich block of text to read. Itβs like searching a library index for a "keyword" but then pulling the whole "book" off the shelf to get the full story.
π 1. Sentence Window Retrieval: The "Magnifying Glass"
Imagine you are reading a novel. To understand a specific line of dialogue, you usually just need to know what happened a few seconds before and after.
How it works: You break your data into individual sentences. When the AI finds a relevant sentence, it automatically grabs the 3β5 sentences immediately surrounding it.
The Vibe: Linear and local.
Best for: Narrative text, chat transcripts, or articles where ideas flow sentence-by-sentence.
πΊοΈ 2. Parent Document Retrieval: The "Map"
Imagine a Technical Manual or a Legal Contract. A single sentence like "Tighten the bolt" is useless if the safety warning is at the top of the page. You don't just need the "neighboring sentences"; you need the whole section.
How it works: You create a hierarchy. You have Parent chunks (like a full page) and Child chunks (small paragraphs inside that page). The AI searches the "Children" but returns the "Parent" to the LLM.
The Vibe: Structural and organized.
Best for: PDFs, manuals, financial reports, and legal docs where sections are logically grouped.
Comparison Table
| Feature | Sentence Window | Parent Document |
|---|---|---|
| Logic | "Show me whatβs around this." | "Show me the section this belongs to." |
| Structure | Flat/Linear | Hierarchical (Big & Small) |
| Storage | Context is often hidden in metadata. | Parents are stored in a separate database. |
| Best Use Case | Books, Emails, Conversations. | Technical Specs, Legal, Wiki pages. |
π Summary
Choose Sentence Window if your data is "unstructured" and the context is always right next to the answer. Itβs easier to set up and works great for simple Q&A.
Choose Parent Document if you are building an Enterprise-grade tool. It is more "stable" because it respects document boundaries (like chapters or headers), ensuring the LLM never gets a half-finished thought from a different page.
Top comments (0)