Eduardo Borges

Posted on Apr 21

RAG Is Failing in Production — Here’s Why (and What I’m Testing Instead)

#ai #machinelearning #llm #programming

RAG (Retrieval-Augmented Generation) looks great in demos.

But in real-world systems, it often fails in subtle ways.

Not because retrieval is bad.

But because it lacks something more fundamental.

The Problem I Kept Seeing

Everything worked fine… until it didn’t.

Simple questions? Great.

But anything that depended on multiple systems?

That’s where things started to break.

Example:

"How does the production deploy process work?"

A typical RAG system retrieves documents like:

CI/CD pipeline
Kubernetes deployment
Monitoring setup

All relevant.

All correct.

And still… incomplete.

Why the Answer Is Still Wrong

Because the real answer is not inside a single document.

It’s in how they connect:

CI/CD triggers Kubernetes
Deploy emits metrics
Monitoring consumes those metrics
Alerts trigger incident response
Incident response triggers rollback

This is not a list.

This is a system.

And RAG doesn’t understand systems.

The Core Issue

RAG retrieves by similarity.

But real-world knowledge is structured by relationships.

So even when retrieval is "correct", the model gets:

fragments of truth
without the structure to connect them

That’s why answers feel incomplete.

“Just Use Better Embeddings” Doesn’t Fix It

I tried that.

Better embeddings:

improve ranking
reduce noise

But they don’t fix the core problem.

You still get isolated chunks.

What I Started Testing

Instead of treating documents as independent pieces, I tried:

semantic search (same as RAG)
+ building a graph of relationships between documents
+ retrieving connected context

So instead of:

"Here are 3 relevant documents"

You get:

"Here’s how these documents connect"

What Changed

In scenarios where context spans multiple domains:

answers became more complete
fewer gaps in reasoning
less "guessing" from the model

It’s not perfect — but the difference is noticeable.

The Tradeoff Nobody Talks About

This approach adds:

complexity
processing overhead
graph construction challenges

And I’m still figuring out:

When is this actually worth it?

What I Built Around This

I ended up building a small tool to explore this idea in practice.

It ingests documents, maps relationships, and retrieves connected context instead of isolated chunks.

If you want to see it:
👉 https://usemindex.dev/

Open Question

I’m not convinced this is always the right direction.

Curious to hear from others:

Have you seen RAG fail like this in production?
Are you solving this at retrieval time?
Or relying on the model to stitch context together?

Would love to compare notes.

DEV Community