DEV Community

Cover image for Chunking Strategies for LLM Applications: A Practical Guide to Better RAG Systems
Vivek
Vivek

Posted on

Chunking Strategies for LLM Applications: A Practical Guide to Better RAG Systems

Learn how chunking impacts retrieval quality, embedding performance, and the overall effectiveness of Retrieval-Augmented Generation (RAG) systems.

Introduction

When building AI applications using Retrieval-Augmented Generation (RAG), developers often focus on selecting the best LLM or embedding model. But one foundational step is frequently underestimated chunking

Chunking

Chunking is the process of breaking large documents into smaller, manageable pieces before generating embeddings and storing them in a vector database.

Poor chunking can lead to:

  • Irrelevant retrieval results
  • Hallucinated answers
  • Missing context
  • Higher inference costs

Good chunking, on the other hand, dramatically improves retrieval precision and response quality.

In this article, we'll explore the most common chunking strategies, their trade-offs, and when to use each.


Why Chunking Matters

LLMs and embedding models cannot process infinitely large documents efficiently.

Consider a 200-page PDF.

Instead of embedding the entire file as one vector, we split it into smaller chunks:

Large Document
      ↓
 Chunking
      ↓
Embeddings
      ↓
Vector Database
      ↓
Semantic Retrieval
      ↓
LLM Response
Enter fullscreen mode Exit fullscreen mode

Without Chunking

A single massive embedding:

  • loses semantic granularity
  • retrieves irrelevant sections
  • increases token cost

With Chunking

Relevant document sections become searchable and retrievable.


Understanding the Chunking Trade-Off

Chunk size affects retrieval quality.

Too small:

Missing context
Enter fullscreen mode Exit fullscreen mode

Too large:

Noise + irrelevant information
Enter fullscreen mode Exit fullscreen mode

The ideal chunk balances:

  • semantic meaning
  • retrieval precision
  • token efficiency

1. Fixed-Size Chunking

The simplest and most widely used approach.

Documents are split based on a fixed character or token limit.

Example:

  • 500 tokens
  • 1000 characters

How It Works

Document
──────────────────────────
Chunk 1 (500 tokens)
Chunk 2 (500 tokens)
Chunk 3 (500 tokens)
Enter fullscreen mode Exit fullscreen mode

Python Example

Using LangChain:

from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(document)
Enter fullscreen mode Exit fullscreen mode

Pros

  • Easy to implement
  • Fast processing
  • Predictable chunk sizes

Cons

  • Ignores document structure
  • May cut sentences mid-way
  • Can reduce semantic meaning

Best For

  • quick prototypes
  • small datasets
  • simple RAG systems

2. Recursive Chunking

A smarter version of fixed-size chunking.

Instead of splitting blindly, it attempts to preserve structure.

Typical hierarchy:

  1. Paragraph
  2. Sentence
  3. Word

Only if a larger section exceeds size limits does it split further.


Workflow

Paragraph too large?
        ↓
Split into sentences
        ↓
Sentence too large?
        ↓
Split into words
Enter fullscreen mode Exit fullscreen mode

Example

LangChain Recursive Splitter:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

chunks = splitter.split_text(document)
Enter fullscreen mode Exit fullscreen mode

Pros

  • Preserves meaning
  • Better retrieval quality
  • Handles mixed documents

Cons

  • Slightly slower
  • May still ignore domain-specific structure

Best For

Most RAG systems.

This is often the default recommendation.


3. Sentence-Based Chunking

This strategy keeps chunks aligned with sentence boundaries.

Instead of arbitrary token counts:

Chunk = Complete Sentences
Enter fullscreen mode Exit fullscreen mode

Example

Document:

AI systems rely on retrieval.
Chunking improves retrieval quality.
Poor chunking hurts accuracy.
Enter fullscreen mode Exit fullscreen mode

Possible chunks:

Chunk 1:
AI systems rely on retrieval.

Chunk 2:
Chunking improves retrieval quality.

Chunk 3:
Poor chunking hurts accuracy.
Enter fullscreen mode Exit fullscreen mode

Python Example

Using NLTK:

import nltk
from nltk.tokenize import sent_tokenize

sentences = sent_tokenize(document)
Enter fullscreen mode Exit fullscreen mode

Pros

  • Natural language boundaries
  • Cleaner embeddings
  • Improved semantic integrity

Cons

  • Uneven chunk sizes
  • Large sentences may exceed limits

Best For

  • conversational data
  • articles
  • QA systems

4. Paragraph-Based Chunking

Paragraphs usually contain a coherent idea.

This makes them useful chunk boundaries.


Example

Paragraph 1 → Chunk 1
Paragraph 2 → Chunk 2
Paragraph 3 → Chunk 3
Enter fullscreen mode Exit fullscreen mode

Pros

  • High semantic coherence
  • Human-readable chunks
  • Works well for blogs and docs

Cons

  • Paragraph length varies
  • Large paragraphs can overflow

Best For

  • blogs
  • documentation
  • research papers

5. Overlapping Chunking

One major issue with chunking:

context loss at boundaries.

Example:

Chunk 1:

The API authentication uses JWT...
Enter fullscreen mode Exit fullscreen mode

Chunk 2:

...tokens for secure communication.
Enter fullscreen mode Exit fullscreen mode

Important meaning spans both chunks.

Overlap solves this.


How Overlap Works

Chunk 1
──────────────
AAAA BBBB CCCC

Chunk 2
          CCCC DDDD EEEE
Enter fullscreen mode Exit fullscreen mode

Notice:

CCCC

appears in both chunks.


Code Example

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)
Enter fullscreen mode Exit fullscreen mode

Pros

  • Better retrieval continuity
  • Reduces boundary problems
  • Higher answer accuracy

Cons

  • More embeddings
  • Larger vector storage
  • Increased retrieval cost

Best For

Nearly all production RAG systems.

Typical overlap:

  • 10–20%

6. Semantic Chunking

Semantic chunking uses meaning instead of size.

The document is split where topic changes occur.

This is significantly more intelligent.


Concept

Instead of:

Every 500 tokens
Enter fullscreen mode Exit fullscreen mode

we split by:

Meaning shift
Enter fullscreen mode Exit fullscreen mode

Example

Document:

Section A → Databases
Section B → Kubernetes
Section C → Security
Enter fullscreen mode Exit fullscreen mode

Semantic chunking creates:

Chunk 1 → Database topic
Chunk 2 → Kubernetes topic
Chunk 3 → Security topic
Enter fullscreen mode Exit fullscreen mode

High-Level Pipeline

Text
 ↓
Sentence embeddings
 ↓
Similarity comparison
 ↓
Topic boundary detection
 ↓
Chunks
Enter fullscreen mode Exit fullscreen mode

Python Example (Conceptual)

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
Enter fullscreen mode Exit fullscreen mode

Sentence similarity determines where to split.


Pros

  • Excellent retrieval quality
  • Topic-aware
  • Strong contextual relevance

Cons

  • Computationally expensive
  • More implementation effort

Best For

  • enterprise search
  • legal documents
  • knowledge bases

7. Structure-Aware Chunking

Some documents already contain structure.

Examples:

  • HTML headings
  • Markdown sections
  • PDFs with titles
  • Code files

Instead of ignoring this, we use it.


Example

Markdown:

# Authentication
JWT details...

# Rate Limiting
API throttling...
Enter fullscreen mode Exit fullscreen mode

Chunks:

Authentication section
Rate Limiting section
Enter fullscreen mode Exit fullscreen mode

Code Example

Markdown Header Splitter:

from langchain.text_splitter import MarkdownHeaderTextSplitter

headers = [
    ("#", "Header1"),
    ("##", "Header2")
]
Enter fullscreen mode Exit fullscreen mode

Pros

  • High semantic consistency
  • Uses author intent
  • Excellent for documentation

Cons

  • Depends on clean formatting
  • Less effective on raw text

Best For

  • developer docs
  • wikis
  • technical manuals

8. Code Chunking

Source code needs special handling.

Splitting every 500 characters can break logic.

Instead:

Split by:

  • function
  • class
  • module
  • AST nodes

Bad Chunk

def login():
    ...
Enter fullscreen mode Exit fullscreen mode

cut halfway.


Better Chunk

Entire login() function
Enter fullscreen mode Exit fullscreen mode

Example Using Tree-sitter

import tree_sitter
Enter fullscreen mode Exit fullscreen mode

AST-based parsing preserves syntax.


Pros

  • Maintains logical structure
  • Better code retrieval
  • Strong for AI coding assistants

Cons

  • Language-specific tooling

Best For

  • code copilots
  • repository search
  • software documentation

Comparing Chunking Strategies

Strategy Quality Complexity Best Use
Fixed Size Low Low Prototypes
Recursive High Low General RAG
Sentence Medium Low QA
Paragraph Medium Low Articles
Overlap High Low Production RAG
Semantic Very High High Enterprise
Structure-Aware High Medium Docs
Code Chunking Very High High Code AI

A Practical Chunking Strategy

Many successful RAG systems use a hybrid approach.

Example:

Structure-aware
        +
Recursive splitting
        +
10–20% overlap
Enter fullscreen mode Exit fullscreen mode

Pipeline:

Document
   ↓
Heading Split
   ↓
Recursive Chunking
   ↓
Overlap
   ↓
Embeddings
   ↓
Vector DB
Enter fullscreen mode Exit fullscreen mode

This usually offers the best balance between:

  • relevance
  • cost
  • simplicity

Final Thoughts

Chunking is not just preprocessing.

It directly influences:

  • retrieval precision
  • embedding quality
  • hallucination rate
  • user experience

There is no universal best strategy.

A good rule:

  • Start with recursive + overlap
  • Move to semantic or structure-aware chunking as complexity grows
  • Use code-aware chunking for engineering systems

In many cases, improving chunking yields larger gains than switching to a bigger LLM.


Top comments (0)