DEV Community

Eduardo Borges
Eduardo Borges

Posted on

RAG Is Failing in Production — Here’s Why (and What I’m Testing Instead)

RAG (Retrieval-Augmented Generation) looks great in demos.

But in real-world systems, it often fails in subtle ways.

Not because retrieval is bad.

But because it lacks something more fundamental.


The Problem I Kept Seeing

Everything worked fine… until it didn’t.

Simple questions? Great.

But anything that depended on multiple systems?

That’s where things started to break.

Example:

"How does the production deploy process work?"

A typical RAG system retrieves documents like:

  • CI/CD pipeline
  • Kubernetes deployment
  • Monitoring setup

All relevant.

All correct.

And still… incomplete.


Why the Answer Is Still Wrong

Because the real answer is not inside a single document.

It’s in how they connect:

  • CI/CD triggers Kubernetes
  • Deploy emits metrics
  • Monitoring consumes those metrics
  • Alerts trigger incident response
  • Incident response triggers rollback

This is not a list.

This is a system.

And RAG doesn’t understand systems.


The Core Issue

RAG retrieves by similarity.

But real-world knowledge is structured by relationships.

So even when retrieval is "correct", the model gets:

  • fragments of truth
  • without the structure to connect them

That’s why answers feel incomplete.


“Just Use Better Embeddings” Doesn’t Fix It

I tried that.

Better embeddings:

  • improve ranking
  • reduce noise

But they don’t fix the core problem.

You still get isolated chunks.


What I Started Testing

Instead of treating documents as independent pieces, I tried:

  • semantic search (same as RAG)
  • + building a graph of relationships between documents
  • + retrieving connected context

So instead of:

"Here are 3 relevant documents"

You get:

"Here’s how these documents connect"


What Changed

In scenarios where context spans multiple domains:

  • answers became more complete
  • fewer gaps in reasoning
  • less "guessing" from the model

It’s not perfect — but the difference is noticeable.


The Tradeoff Nobody Talks About

This approach adds:

  • complexity
  • processing overhead
  • graph construction challenges

And I’m still figuring out:

When is this actually worth it?


What I Built Around This

I ended up building a small tool to explore this idea in practice.

It ingests documents, maps relationships, and retrieves connected context instead of isolated chunks.

If you want to see it:
👉 https://usemindex.dev/


Open Question

I’m not convinced this is always the right direction.

Curious to hear from others:

  • Have you seen RAG fail like this in production?
  • Are you solving this at retrieval time?
  • Or relying on the model to stitch context together?

Would love to compare notes.

Top comments (0)