DEV Community

Cover image for Top 5 Mistakes Engineers Make While Building RAG Systems
Manas Mishra
Manas Mishra

Posted on

Top 5 Mistakes Engineers Make While Building RAG Systems

Story Time 😁

I was working on a project that is highly dependent on RAG. User can be able to upload any document ( pdf, Word, PPT, ppt etc ), and AI can be able to answer any type of question from it.

When I started building the project, the first choice was always RAG, but the way I handled it broke a lot of things.

I uploaded the document regarding a security protocol of a school, it was 100 page pdf, and I asked a question, "Tell me which protocols to follow," and it answered beautifully.

But then the second question was, "Summarize the document." It's not able to consider everything in the document. Then the third question was "how many pages does this document have?" It's not able to answer that either.


Retrieval-Augmented Generation (RAG) looks simple on a whiteboard:

Embed β†’ Store β†’ Retrieve β†’ Prompt β†’ Generate.

In production, it’s rarely that clean.

After building and testing multiple RAG pipelines, I’ve noticed the same engineering mistakes repeated.

Here are the top 5.


1. Using Default Chunk Size Without Thinking

Most of you will pick:

  • 500 tokens
  • 1000 tokens
  • Or whatever the framework defaults to

Chunking is not about token count - it’s about semantic boundaries.

Bad chunking causes Context fragmentation, Low retrieval precision, and Hallucinated stitching between unrelated paragraphs

The better approach that I used in my production RAGs is:

  • Chunk by section headers
  • Preserve semantic completeness
  • Use overlap strategically (not blindly)

Chunking quality directly impacts retrieval quality, so you can't ignore it.


2. Ignoring Metadata Filtering

Many systems store embeddings like this:

{ vector, text }
Enter fullscreen mode Exit fullscreen mode

That’s not enough.

In production, you need structured metadata:

{
  vector,
  text,
  source,
  document_type,
  timestamp,
  user_id,
  version
}
Enter fullscreen mode Exit fullscreen mode

If you don't focus on metadata filtering, then retrieval becomes noisy, results mix unrelated documents, and also the multi-tenant systems break

Metadata filtering dramatically improves precision before semantic ranking even starts.


3. Not Evaluating Retrieval Quality

Most teams evaluate only the final LLM output.

You must evaluate the following 4 too:

  • Precision@k
  • Recall@k
  • MRR (Mean Reciprocal Rank)
  • Context relevance score

If retrieval is wrong, generation doesn’t matter.

RAG is a retrieval problem first, a generation problem second.


4. No Reranking Layer

A typical pipeline:

Vector Search β†’ Top 5 β†’ Send to LLM

And this is incomplete.

Dense retrieval optimizes for similarity - not relevance.

You should add a cross-encoder reranker

Yes, it adds latency.
But in high-value systems, accuracy > milliseconds.


5. Storing Raw PDFs Blindly

PDFs are messy:

  • Headers
  • Footers
  • Page numbers
  • Tables split across pages

You should do the preprocessing first, as it:

  • Remove boilerplate
  • Normalize structure
  • Clean tables
  • Remove repeated artifacts

Garbage in β†’ garbage retrieval β†’ hallucinated output.


RAG is not just:
"Add a vector database and call it a day."

It’s an information retrieval system wrapped around a generative model.

If retrieval is weak, the entire architecture collapses.


Top comments (0)