Threshika Vijayakumar

Posted on Jun 10

I Thought My RAG Was Broken. The Real Problem Was Chunking.

#ai #rag #llm #machinelearning

When I started learning RAG, I assumed the difficult parts would be:

Embeddings
Vector databases
LLMs

I was wrong.

My embeddings were working.

My vector database was returning results.

The LLM was generating answers.

Yet the responses were often incomplete, irrelevant, or missing important context.

After hours of debugging, I discovered the problem wasn't the model.

It was how I was splitting my documents.

Why Chunking Matters More Than Most People Think

A RAG system can only retrieve what it can find.

And what it can find depends heavily on how your documents are chunked.

Bad chunking leads to:

Missing context
Poor retrieval
Irrelevant answers
Hallucinations

Even when everything else is configured correctly.

Figure 1: Good chunking improves retrieval quality, while bad chunking fragments context and hurts answer quality.

In many cases, the quality of your answers is decided before the LLM generates a single token.

Mistake #1: Chunks That Are Too Large

Imagine storing an entire chapter as a single chunk.

20-page chapter
        ↓
      1 chunk

Now a user asks a question about one paragraph.

The retrieval system has to bring back the entire chapter.

This introduces a lot of irrelevant context and makes retrieval less precise.

Bigger chunks don't always mean better answers.

Mistake #2: Chunks That Are Too Small

I then tried the opposite approach.

Tiny chunks.

Something like:

Chunk 1:
The capital of France is

Chunk 2:
Paris

The problem?

Context gets destroyed.

The retrieval system may find only part of the answer.

The information exists, but the meaning is fragmented.

Figure 2: Effective chunking is a balance. Chunks that are too large introduce noise, while chunks that are too small lose context.

This was the first time I realized that chunk size isn't just a preprocessing setting—it directly impacts retrieval quality.

Mistake #3: No Chunk Overlap

This was one of the most surprising lessons.

Without overlap:

Chunk 1
--------
Embeddings
Vector Search

Chunk 2
--------
Retrieval
Generation

What happens if an important concept sits between the boundary of two chunks?

You lose context.

Adding overlap helps preserve information that naturally spans multiple chunks.

Mistake #4: Splitting by Character Count Alone

A lot of tutorials do something like:

chunk_size = 500

and stop there.

The problem is that text doesn't naturally organize itself into 500-character blocks.

You might accidentally split:

The vector database stores embeddings used for...

and

...semantic search across documents.

The sentence survives.

The meaning doesn't.

Mistake #5: Using the Same Strategy Everywhere

Not every document should be chunked the same way.

Documentation, codebases, contracts, and research papers all have different structures.

For example:

Documentation → section-based chunks
Code → function or class-based chunks
Research papers → section-based chunks
Contracts → clause-based chunks

The document structure often provides better chunk boundaries than arbitrary token counts.

The Lesson That Changed My Thinking

When I started learning RAG, I viewed chunking as a preprocessing step.

Now I see it differently.

Chunking is retrieval engineering.

Because retrieval quality directly affects answer quality.

Better chunks lead to:

Better retrieval
Better context
Better answers

Without changing the LLM at all.

Final Thoughts

The biggest surprise in my RAG journey wasn't embeddings or vector databases.

It was discovering how much impact document splitting has on retrieval.

If your RAG system isn't performing well, don't immediately blame the model.

Look at your chunks first.

The problem might already exist before the LLM ever sees the question.

💡 What's your preferred chunking strategy when building RAG systems?

DEV Community