Stop Using Fixed-Size Chunking for Everything. Here Is What to Use Instead.

Fixed-size chunking is the default in almost every RAG tutorial. Split your documents into 512-token chunks with 50-token overlap, embed them, call it done. It works well enough that most people never question it, and then they wonder why retrieval quality plateaus.

The problem is that fixed-size chunking is a compromise that optimizes for simplicity, not for retrieval quality. It ignores document structure entirely. A 512-token chunk might cut a reasoning chain in half. It might merge two unrelated policy points that happen to appear near each other. It might split a table between chunks in a way that makes both chunks uninterpretable.

Here are the strategies I actually use depending on document type.

For structured documents with clear sections: hierarchical chunking

Technical documentation, legal contracts, policy documents, anything with explicit heading structure benefits from chunking that follows the document's own hierarchy.

from langchain.text_splitter import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("#", "section"),
    ("##", "subsection"),
    ("###", "subsubsection"),
]

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on,
    strip_headers=False
)

chunks = splitter.split_text(document_text)
# Each chunk now inherits the header hierarchy as metadata
# section="Data Handling" subsection="Retention Policy"

The metadata inheritance is the key value here. When you retrieve a chunk about retention policy, you know it came from the Data Handling section of the document, not just that it mentioned retention somewhere.

For dense technical or scientific content: semantic chunking

When document sections do not map cleanly to headings, use sentence-level semantic similarity to find natural break points.

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

# Or replace with your self-hosted embedding model
semantic_splitter = SemanticChunker(
    OpenAIEmbeddings(),
    breakpoint_threshold_type="percentile",
    breakpoint_threshold_amount=85
)

chunks = semantic_splitter.split_text(dense_technical_doc)

This is slower than fixed-size splitting because it requires embedding intermediate sentences to find break points. For documents where semantic coherence matters significantly, like research summaries or detailed technical analyses, the retrieval quality improvement is worth the indexing cost.

For tables and structured data: row-level chunking with header injection

Tables chunked mid-row are useless. Every row needs its column headers.

import pandas as pd

def chunk_table(df: pd.DataFrame, metadata: dict) -> list:
    chunks = []
    header = " | ".join(df.columns.tolist())

    for idx, row in df.iterrows():
        row_text = " | ".join([f"{col}: {val}" for col, val in row.items()])
        chunk_text = f"Columns: {header}\nRow {idx}: {row_text}"

        chunks.append({
            "text": chunk_text,
            "metadata": {**metadata, "row_index": idx, "chunk_type": "table_row"}
        })
    return chunks

Every row chunk contains the full column context. The AI can answer "what is the retention period for category B data" because the column header "retention period" is in every chunk, not just in the header row.

For long-form prose where nothing else fits: sliding window with parent retrieval

When you genuinely have documents that do not have structure you can exploit, sliding window chunking with parent document retrieval gives you the best of both worlds. Small chunks for precise retrieval, larger parent chunks sent to the LLM for actual generation.

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.text_splitter import RecursiveCharacterTextSplitter

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=40)

store = InMemoryStore()  # use persistent store in production

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

The child chunks are what gets matched during similarity search. The parent chunk is what gets sent to the LLM. Precise matching, rich context for generation.

The right chunking strategy is the one that preserves the semantic units that actually matter for your document type. Fixed-size chunking ignores document structure because it does not know what that structure is. Using document structure when you have it is almost always worth the extra implementation effort.

DEV Community

Stop Using Fixed-Size Chunking for Everything. Here Is What to Use Instead.

Top comments (0)