DEV Community

Eduardo Borges
Eduardo Borges

Posted on

Why RAG Breaks in Real-World Systems (and How I’m Trying to Fix It)

Most Retrieval-Augmented Generation (RAG) setups work well for simple questions.

But once you move to real-world systems, things start to break.

I kept running into the same issue over and over again — and it wasn’t obvious at first why.


The Problem Isn’t Retrieval — It’s Context

Let’s take a simple example:

"How does the production deploy process work?"

A typical RAG system will retrieve documents like:

  • CI/CD pipeline
  • Kubernetes deployment
  • Monitoring setup

Individually, these are relevant.

But they’re treated as isolated chunks of information.


Where It Breaks

In reality, the answer depends on how these systems connect:

  • CI/CD triggers Kubernetes
  • The deploy emits metrics
  • Monitoring consumes those metrics
  • Alerts trigger incident response
  • Incident response may trigger rollback

This is not a list of documents.

This is a chain of relationships.

And this is exactly where traditional RAG struggles.

Even when retrieval is technically "correct", the model lacks the structure to connect these pieces.


Why Better Embeddings Don’t Solve It

A common reaction is:

"We just need better embeddings."

I tried that.

It improves ranking — but it doesn’t solve the core issue.

You still get:

  • relevant documents
  • but no understanding of how they relate

The model gets fragments, not structure.


What I Started Experimenting With

To address this, I started exploring a different approach:

  • Use embeddings for semantic search (same as RAG)
  • Build a knowledge graph connecting documents
  • Retrieve not just matches, but connected context

So instead of returning:

"Here are 3 similar documents"

You get:

"Here are the relevant documents AND how they connect"


What Changed

In scenarios where the answer spans multiple systems, the difference is noticeable.

Instead of partial answers, the model can follow the chain:

CI/CD → Kubernetes → Monitoring → Incident Response

This leads to:

  • more complete answers
  • fewer hallucinations
  • better reasoning across systems

The Tradeoff

This approach is not free.

It adds:

  • complexity
  • processing overhead
  • graph construction challenges

And I’m still figuring out:

When is this actually worth it vs just overengineering RAG?


What I Built

I ended up building a small tool to explore this idea in practice.

It ingests documents, builds relationships between them, and retrieves connected context instead of isolated chunks.

If you're curious, you can check it out here:
👉 https://usemindex.dev/


Open Question

I’m still early in this exploration, and I’m not convinced this is always the right approach.

Curious to hear from others:

  • Have you hit similar limitations with RAG?
  • How are you handling cross-document context today?
  • Are you solving this at retrieval time, or leaving it to the model?

Would love to learn how others are approaching this.

Top comments (0)