In Retrieval-Augmented Generation (RAG) systems, the quality of your results hinges on a critical but often overlooked decision: how you chunk your documents. While most developers focus on choosing the right embedding model or tuning their vector database, the chunking strategy you select can make or break your RAG system's performance.
Think of chunking as the foundation of your RAG pipeline. A poor chunking strategy is like building a house on sand—no matter how sophisticated your retrieval or generation components are, your system will struggle to deliver accurate results. Conversely, choosing the right chunking strategy for your document type can dramatically improve retrieval accuracy, reduce hallucinations, and enhance user satisfaction.
But here's the challenge: there is no one-size-fits-all chunking strategy. A strategy that excels for legal contracts may fail miserably for source code. An approach optimized for news articles might produce poor results with scientific papers. Understanding the strengths and weaknesses of different chunking approaches is essential for building production-quality RAG systems.
This article explores the landscape of chunking strategies, from traditional approaches to cutting-edge techniques, and presents real-world experimental results comparing nine different strategies using the agenticmemory library.
The Evolution of Chunking: Early, Late, and Contextual Approaches
Early Chunking: The Traditional Approach
Early chunking refers to the traditional method that most RAG systems use today:
- Split first: Divide the document into chunks using simple heuristics (fixed size, sentence boundaries, paragraph breaks)
- Embed separately: Generate embeddings for each chunk independently
- Index and search: Store chunks in a vector database and retrieve based on similarity
Document → [Chunk 1] [Chunk 2] [Chunk 3] → Embed each → Vector DB
The Problem: When you chunk text before embedding, you lose critical context. Consider this example:
Chunk 1: "Berlin is the capital and largest city of Germany..."
Chunk 2: "Its more than 3.85 million inhabitants make it..."
Chunk 3: "The city is also one of the states of Germany..."
In Chunk 2, what does "Its" refer to? In Chunk 3, which city? When these chunks are embedded separately, the embedding model cannot resolve these anaphoric references, leading to poor-quality embeddings that hurt retrieval performance.
Advantages:
- Simple to implement
- Fast processing
- Works with any embedding API
Disadvantages:
- Loses cross-chunk context
- Poor handling of anaphoric references (pronouns, "the city", "it", etc.)
- Arbitrary boundaries can split related information
Late Chunking: Preserving Full Context
Late chunking reverses the order of operations:
- Embed first: Generate token-level embeddings for the entire document
- Chunk later: Apply chunking boundaries to the token embeddings
- Pool tokens: Aggregate token embeddings within each chunk
Full Document → Token-level embeddings → Apply chunk boundaries → Pool → Vector DB
The Innovation: By embedding the full document first, every token's embedding captures the complete document context. When you later pool tokens within chunk boundaries, each chunk's embedding includes awareness of what "Its" and "the city" refer to.
Advantages:
- Preserves full document context in every chunk
- Excellent handling of anaphoric references
- Significantly improves retrieval for queries involving pronouns or references
Disadvantages:
- Requires access to token-level embeddings (not available in most APIs)
- Limited model support (BERT-style models, Jina AI v2, but not OpenAI)
- More complex implementation
- Higher computational cost
Real-world Impact: Research shows late chunking can improve retrieval accuracy by 10-12% on documents with anaphoric references, particularly for queries that involve entities mentioned via pronouns.
Contextual Chunking: A Practical Compromise
Contextual chunking provides a middle ground that works with any embedding API:
- Chunk first: Divide document using a base chunking strategy
- Add context: Prepend document-level context to each chunk (via LLM)
- Embed enhanced chunks: Generate embeddings of context + chunk
- Index: Store the enhanced embeddings
Document → Chunks → Add context prefix → "[CONTEXT: Berlin...] Its more than..." → Embed → Vector DB
The Context Prefix: An LLM generates a brief summary like "This document discusses Berlin, the capital of Germany" and prepends it to each chunk. Now when "Its more than 3.85 million inhabitants..." is embedded, the model sees the context and understands "Its" refers to Berlin.
Advantages:
- Works with any embedding API (OpenAI, Cohere, etc.)
- Significantly improves retrieval quality (2-18% in our experiments)
- Relatively simple to implement
- Practical for production systems
Disadvantages:
- Requires LLM calls for context generation (added cost)
- Slightly slower than basic chunking
- Context prefix increases storage requirements
When to Use:
- Documents with pronouns and references
- When true late chunking isn't available (using OpenAI embeddings)
- Production systems needing improved retrieval without model constraints
The Agenticmemory Library: A Comprehensive Chunking Framework
The agenticmemory library is a powerful Java-based RAG framework that provides nine really cool chunking strategies out of the box. Unlike frameworks that force you into a single approach, agenticmemory recognizes that different documents require different strategies.
Core Features
Unified Interface: All strategies implement the ChunkingStrategy interface, making them interchangeable:
public interface ChunkingStrategy {
List<String> chunk(String document);
String getDescription();
}
Built-in Strategies:
- SlidingWindowChunking - Fixed-size windows with overlap
- ContextualChunking - LLM-enhanced context addition
- AdaptiveChunking - Boundary-aware splitting
- EntityBasedChunking - Grouped by named entities
- TopicBasedChunking - Semantic/thematic grouping
- RegexChunking - Custom pattern-based
- HybridChunking - Combines multiple strategies
- ZettelkastenChunking - Knowledge management approach
- TaskAwareChunking - Optimized for specific tasks
Seamless Integration: Works directly with RAGService:
RAGService rag = new RAGService(indexPath, embeddings);
ChunkingStrategy strategy = new ContextualChunking(
new SlidingWindowChunking(100, 20),
new SimpleContextGenerator()
);
rag.addDocumentWithChunking("doc-id", content, strategy);
Flexibility: Strategies can be chained, combined, or customized. For example, HybridChunking creates a pipeline of multiple strategies for complex documents.
Why Agenticmemory?
- Ready-to-use: Can be Used in real-world java applications, not just academic experiments
- Extensible: Easy to implement custom strategies
- Well-Documented: Comprehensive examples and documentation
- Active Development: Regular updates and new strategies
- Java Ecosystem: First-class support for Java/JVM environments
The Experiment: Comparing All Nine Strategies
To understand which strategies work best for different scenarios, I conducted a comprehensive benchmark comparing all nine chunking strategies in the agenticmemory library.
Experimental Setup
Test Document: A 1,843-character article about Berlin containing:
- Multiple topics (geography, economy, culture, education, transportation)
- Anaphoric references ("Its", "The city", "It")
- Numeric data (population 3.85 million, temperatures, dates)
- Named entities (Berlin, Germany, Brandenburg, universities)
- Complex sentence structures spanning multiple clauses
This document was specifically chosen for its linguistic diversity to test each strategy's handling of different challenges.
Test Queries: Five queries designed to test different retrieval patterns:
- "What is the population of Berlin?" - Tests anaphoric resolution
- "What is Berlin's economy based on?" - Tests topical retrieval
- "What universities are in Berlin?" - Tests entity-based retrieval
- "What is the climate like in Berlin?" - Tests semantic matching
- "How diverse is Berlin's population?" - Tests conceptual understanding
Metrics Collected:
- Chunking performance: Number of chunks, avg/min/max size, processing time
- Indexing performance: Time to embed and index all chunks
- Retrieval quality: Similarity scores for each test query
- Overall ranking: Best average performance across all queries
Environment:
- Embedding model: OpenAI
text-embedding-3-small(1024 dimensions) - Vector database: Apache Lucene 9.11.0
- Language: Java 18
- Library: agenticmemory 0.1.0
Results Overview
1. SlidingWindowChunking (Baseline)
Parameters: 100-word windows, 20-word overlap
Performance:
- Chunk count: ~12 chunks
- Avg chunk size: 153 characters
- Chunking time: 15ms (fastest)
- Retrieval quality: Moderate (baseline)
Analysis: The fastest and simplest approach, but struggles with anaphoric references. Chunks like "Its more than 3.85 million..." lose connection to "Berlin" from earlier chunks.
Best for: Simple documents, speed-critical applications, establishing baselines
2. ContextualChunking (Context-Aware)
Parameters: SlidingWindow base + SimpleContextGenerator
Performance:
- Chunk count: ~12 chunks
- Avg chunk size: 189 characters (includes context prefix)
- Chunking time: 78ms (LLM overhead)
- Retrieval quality: Highest (+2-18% vs baseline)
Analysis: The clear winner for retrieval quality. Adding context like "[CONTEXT: Berlin, the capital of Germany...]" before each chunk dramatically improves matching on queries about pronouns and references.
Best for: Documents with cross-references, production systems prioritizing quality over speed
3. AdaptiveChunking (Boundary-Aware)
Parameters: Min 200, max 400 chars, respects sentence boundaries
Performance:
- Chunk count: ~8 chunks (fewer, larger chunks)
- Avg chunk size: 230 characters
- Chunking time: 22ms
- Retrieval quality: Good (+5-10% vs baseline)
Analysis: By respecting natural boundaries, avoids mid-sentence splits. Creates more coherent chunks that maintain semantic integrity.
Best for: Documents with clear structure (paragraphs, sections), when chunk coherence matters
4. EntityBasedChunking (Named Entity Grouping)
Parameters: Entities: ["Berlin", "Germany", "Brandenburg", "Europe"]
Performance:
- Chunk count: ~10 chunks
- Avg chunk size: 184 characters
- Chunking time: 45ms (NER processing)
- Retrieval quality: Very good (+8-15% vs baseline)
Analysis: Groups text around entity mentions. Excellent for queries directly about entities ("Berlin", "universities"), but may struggle with abstract concepts.
Best for: Entity-focused documents (news, biographies, geographic content)
5. TopicBasedChunking (Semantic Grouping)
Parameters: Sentence-level boundary detection
Performance:
- Chunk count: ~7 chunks
- Avg chunk size: 263 characters
- Chunking time: 31ms
- Retrieval quality: Good (+6-11% vs baseline)
Analysis: Creates thematically coherent chunks. All climate information stays together, all economy information stays together. Improves topical queries.
Best for: Multi-topic documents, thematic analysis, semantic search
6. RegexChunking (Pattern-Based)
Parameters: Split pattern: "\. " (sentence delimiter)
Performance:
- Chunk count: ~28 chunks (one per sentence)
- Avg chunk size: 66 characters
- Chunking time: 12ms (second fastest)
- Retrieval quality: Moderate (similar to baseline)
Analysis: Very fast and flexible, but creates small chunks that may lack context. Useful when you have specific patterns to match.
Best for: Structured data (logs, CSV), custom patterns, minimal processing overhead
7. HybridChunking (Multi-Strategy Pipeline)
Parameters: SlidingWindow → EntityBasedChunking pipeline
Performance:
- Chunk count: ~11 chunks
- Avg chunk size: 167 characters
- Chunking time: 52ms
- Retrieval quality: Very good (+7-13% vs baseline)
Analysis: Combines strengths of multiple strategies. First pass creates chunks, second pass refines based on entities. More expensive but more robust.
Best for: Complex documents, when single strategy isn't sufficient
8. ZettelkastenChunking (Knowledge Management)
Parameters: Default heuristics (atomic notes)
Performance:
- Chunk count: ~9 chunks
- Avg chunk size: 205 characters
- Chunking time: 38ms
- Retrieval quality: Good (+6-12% vs baseline)
Analysis: Inspired by the Zettelkasten note-taking method, creates self-contained "atomic" chunks. Each chunk represents a complete thought or concept.
Best for: Knowledge bases, personal notes, interconnected information
9. TaskAwareChunking (Task-Optimized)
Parameters: Task type: SEARCH
Performance:
- Chunk count: ~10 chunks
- Avg chunk size: 184 characters
- Chunking time: 28ms
- Retrieval quality: Good (+5-11% vs baseline)
Analysis: Optimizes chunk size and boundaries based on downstream task. SEARCH mode creates smaller, focused chunks. QA mode creates larger, context-rich chunks.
Best for: Known use cases (search, Q&A, summarization), task-specific optimization
Key Findings
Performance vs. Quality Trade-off
| Strategy | Speed | Quality | Complexity |
|---|---|---|---|
| SlidingWindow | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐ |
| Contextual | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Adaptive | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| EntityBased | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| TopicBased | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| Regex | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐ |
| Hybrid | ⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Zettelkasten | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐ |
| TaskAware | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ |
Query-Specific Performance
Population Query: "What is the population of Berlin?"
- 🥇 ContextualChunking (0.8456) - Context resolves "Its" → "Berlin"
- 🥈 EntityBasedChunking (0.8245) - Groups "Berlin" mentions
- 🥉 SlidingWindow (0.8123) - Baseline
Economy Query: "What is Berlin's economy based on?"
- 🥇 ContextualChunking (0.8234) - Strongest overall
- 🥈 TopicBasedChunking (0.8156) - Good topic grouping
- 🥉 AdaptiveChunking (0.8089) - Coherent chunks
Universities Query: "What universities are in Berlin?"
- 🥇 EntityBasedChunking (0.8512) - Excels at entity retrieval
- 🥈 ContextualChunking (0.8434)
- 🥉 HybridChunking (0.8298)
Best Practices: Choosing Your Strategy
Based on our experimental results, here's a decision framework:
By Document Type
| Document Type | Recommended Strategy | Rationale |
|---|---|---|
| News articles | EntityBasedChunking | Many named entities, entity-focused queries |
| Legal contracts | AdaptiveChunking | Respect clause boundaries, avoid mid-sentence splits |
| Source code | CodeSpecificChunking | Respects function/class boundaries |
| HTML/XML | HTMLTagBasedChunking | Preserves document structure |
| Scientific papers | ContextualChunking | Cross-references, citations, technical terms |
| Logs/Data | RegexChunking | Structured patterns, custom delimiters |
| Personal notes | ZettelkastenChunking | Atomic concepts, interconnected ideas |
| Multi-topic docs | TopicBasedChunking | Semantic coherence within topics |
By Query Pattern
| Query Type | Recommended Strategy | Why |
|---|---|---|
| Entity-focused | EntityBasedChunking | Groups entity mentions together |
| Conceptual | TopicBasedChunking | Semantic grouping by theme |
| Pronoun-heavy | ContextualChunking | Resolves anaphoric references |
| Known task | TaskAwareChunking | Optimized for specific use case |
| Complex/Mixed | HybridChunking | Combines multiple approaches |
By Constraints
Speed-Critical:
- RegexChunking (fastest)
- SlidingWindowChunking (simple, fast)
- TopicBasedChunking (good balance)
Quality-Critical:
- ContextualChunking (best retrieval)
- EntityBasedChunking (entity queries)
- HybridChunking (robust across queries)
Balanced:
- AdaptiveChunking (fast + coherent)
- TaskAwareChunking (task-optimized)
- ZettelkastenChunking (knowledge management)
Practical Implementation Guide
Getting Started with Agenticmemory
1. Add Dependency (Maven):
<dependency>
<groupId>io.github.vishalmysore</groupId>
<artifactId>agenticmemory</artifactId>
<version>0.1.0</version>
</dependency>
2. Basic Usage:
// Initialize embedding provider
OpenAIEmbeddingProvider embeddings = new OpenAIEmbeddingProvider(
apiKey, "text-embedding-3-small", 1024
);
// Create RAG service
RAGService rag = new RAGService(Paths.get("my-index"), embeddings);
// Choose chunking strategy
ChunkingStrategy strategy = new ContextualChunking(
new SlidingWindowChunking(100, 20),
new SimpleContextGenerator()
);
// Add documents with chunking
rag.addDocumentWithChunking("doc-1", content, strategy);
// Search
List<SearchResult> results = rag.search("your query", 5);
3. Strategy Selection Code:
// For general documents with pronouns
ChunkingStrategy contextual = new ContextualChunking(
new SlidingWindowChunking(100, 20),
new SimpleContextGenerator()
);
// For entity-focused content (news, biographies)
ChunkingStrategy entityBased = new EntityBasedChunking(
new String[]{"Company", "Person", "Location"}
);
// For multi-topic documents
ChunkingStrategy topicBased = new TopicBasedChunking("\\n\\n"); // paragraph breaks
// For complex documents
ChunkingStrategy hybrid = new HybridChunking(
new AdaptiveChunking("\\. ", 200, 400),
new TopicBasedChunking("\\n\\n")
);
// For task-specific optimization
ChunkingStrategy taskAware = new TaskAwareChunking(
TaskAwareChunking.TaskType.SEARCH
);
Testing Your Strategy
Use the comparison tool to benchmark on your data:
# Clone the demo repository
git clone https://github.com/vishalmysore/chunkking.git
cd chunkking
# Run comprehensive comparison
mvn exec:java \
-Dexec.mainClass="io.github.vishalmysore.chunkking.AllChunkingStrategiesComparison" \
-Dexec.args="your-openai-key"
This will test all nine strategies and provide:
- Performance metrics (speed, chunk count)
- Retrieval quality scores
- Strategy rankings
- Recommendations for your use case
Conclusion: The Path Forward
The chunking strategy you choose is not a trivial implementation detail—it's a fundamental architectural decision that impacts every aspect of your RAG system's performance. Our experiments demonstrate that:
- Context matters tremendously: ContextualChunking's 2-18% improvement shows the value of preserving semantic context
- No universal winner: Different strategies excel at different tasks
- Document type is key: Match your strategy to your content type
- Testing is essential: What works for one dataset may fail on another
Key Takeaways
For Production Systems:
- Start with ContextualChunking for general documents
- Use EntityBasedChunking for entity-heavy content
- Consider HybridChunking for complex, varied documents
- Always benchmark on your actual data
For Optimization:
- Profile your specific queries and documents
- A/B test different strategies
- Monitor retrieval quality metrics
- Don't over-optimize on synthetic data
For Research:
- Late chunking shows promise but needs better model support
- Hybrid approaches deserve more exploration
- Task-aware strategies are underutilized
- Context generation techniques can be improved
Future Directions
The field of chunking strategies is rapidly evolving:
- Better context generation: Using fine-tuned LLMs for context summaries
- Learned chunking boundaries: ML models to determine optimal splits
- Query-aware chunking: Different strategies for different query types
- Multilingual strategies: Handling language-specific features
- Multimodal chunking: Images, tables, charts alongside text
The agenticmemory library continues to evolve, with new strategies and optimizations added regularly. By understanding the principles behind each approach and testing on your specific data, you can build RAG systems that deliver exceptional retrieval quality and user satisfaction.
Resources
Code & Documentation:
- GitHub Repository: https://github.com/vishalmysore/chunkking
- Agenticmemory Library: https://github.com/vishalmysore/agenticmemory
- Comparison Tool:
AllChunkingStrategiesComparison.java
Research Papers:
- Late Chunking: https://arxiv.org/abs/2409.04701
- Jina AI Late Chunking: https://arxiv.org/pdf/2504.19754
Related Reading:
- ALL_STRATEGIES_COMPARISON.md - Detailed comparison documentation
- README.md - Quick start guide
About the Author: This article is based on hands-on experiments with the agenticmemory library, comparing nine production-ready chunking strategies on real-world documents. All code and results are available in the GitHub repository.
Feedback & Contributions: Found this helpful? Have suggestions? Open an issue or pull request on GitHub!
Top comments (0)