Your RAG can't answer 'why' -- GraphRAG finds what vector search misses

#ai #rag #graphrag #programming

The question that broke my RAG pipeline

I had a solid RAG setup. Embeddings, vector store, top-k retrieval, the whole thing. It handled factual lookups just fine: "What's the API rate limit?" "Which config file controls logging?" Quick, accurate, done.

Then a teammate asked: "What technical challenges do Project A and Project B have in common?"

The system returned chunks about Project A. Chunks about Project B. Individually relevant. But it never connected the dots between them. It couldn't, because vector search finds similar documents -- not related ones. Those are fundamentally different operations. I spent a solid week rewriting prompts and adjusting chunk overlap before admitting the architecture itself was the bottleneck. A week I'd like back.

This is the structural ceiling of conventional RAG.

What vector search actually can't do

Standard RAG works by converting text into embeddings, then finding the chunks closest to the query in vector space. If your question maps neatly to a single chunk, it works. "What does function X do?" -- vector search nails it.

But try asking:

"What are the overarching themes across this dataset?"
"Why did the team change direction in Q3?"
"Which departments share overlapping risk factors?"

These require reasoning across documents. Connecting point A to point F through B, C, D, and E. Vector similarity can't do this -- it retrieves chunks in isolation, with no awareness of how they relate to each other.

Think of it this way: vector search finds books on the same shelf. GraphRAG finds the footnotes that connect books across different floors of the library.

GraphRAG: the 4-stage pipeline

Microsoft Research introduced GraphRAG in February 2024. The core idea: use an LLM to automatically build a knowledge graph from your documents, then use that graph structure to answer questions that require cross-document reasoning.

Here's how the pipeline works:

Stage 1 -- Entity and relationship extraction. The LLM reads your text and pulls out entities (people, organizations, concepts, technologies) and the relationships between them.

Input: "Microsoft's GraphRAG team developed an LLM-based
knowledge graph construction method, referencing Neo4j's
property graph model."

Output:
(Microsoft) --[has_team]--> (GraphRAG Team)
(GraphRAG Team) --[developed]--> (LLM-based KG Method)
(LLM-based KG Method) --[references]--> (Property Graph Model)
(Property Graph Model) --[originated_from]--> (Neo4j)

Stage 2 -- Leiden clustering. The extracted graph gets clustered using the Leiden algorithm, which groups densely connected nodes into communities. Imagine the first day at a new school: by the end of lunch, there's a gaming group, a soccer group, and the quiet readers. Leiden detects that same kind of natural grouping, automatically, across your entire document set.

Stage 3 -- Community summary generation. An LLM generates a summary for each community, capturing what that cluster of entities and relationships is about. These summaries become the search index.

Stage 4 -- Graph-augmented retrieval. When a user asks a question, the system retrieves relevant community summaries and feeds them to the LLM for answer generation.

Head-to-head: when each approach wins

Dimension	Standard RAG	GraphRAG
Search unit	Document chunks	Community summaries + entities
Best for	Specific fact lookup	Cross-document "why" questions
Reasoning	Within a single chunk	Across document boundaries
Index cost	Low (embedding generation)	High (LLM builds the graph)
Answer grounding	Retrieved chunk citation	Graph-based reasoning paths

The Microsoft Research benchmark on the VIINA dataset (a corpus of Ukraine conflict reports) showed GraphRAG outperformed baseline RAG on comprehensiveness and diversity of answers. NTT Data's independent evaluation confirmed the same pattern for cross-document questions.

Standard RAG isn't obsolete. For "what is X?" queries, it's faster, cheaper, and works fine. The issue is that production workloads rarely consist of only "what is X?" questions.

The cost story: from $33,000 to $0.50

The elephant in the room has always been cost. The original GraphRAG implementation required massive LLM usage during indexing -- extracting entities, generating summaries, running the full pipeline. Early production deployments reported indexing costs north of $33,000 for large datasets.

That number scared people off. Including me -- I bookmarked the paper under "revisit when LLM costs drop" and moved on with my life.

But 2026 changed the math. Three developments collapsed the cost curve:

LazyGraphRAG (Microsoft Research): Instead of expensive upfront summarization, LazyGraphRAG builds a lightweight graph during indexing and defers the heavy work to query time. The result: indexing cost drops to 0.1% of full GraphRAG -- a 1,000x reduction -- while maintaining comparable answer quality for global queries.

LightRAG: Strips GraphRAG to essentials with a simpler extraction pipeline and flat graph structure. A 500-page corpus indexes in about 3 minutes at roughly $0.50. For teams that need "good enough" graph reasoning without the full Microsoft stack, this is a practical starting point.

Token cost optimization in production: Alexander Shereshevsky documented a 90% token cost reduction in production GraphRAG deployments through selective extraction, batched processing, and smarter chunking strategies.

The cost objection is no longer what it was. The question has shifted from "can we afford GraphRAG?" to "which variant fits our query patterns?"

The emerging pattern: Adaptive RAG

The practitioners I've been watching aren't choosing between vector RAG and GraphRAG. They're building query classifiers that route each incoming question to the right pipeline:

Simple factual lookup → standard vector RAG (fast, cheap)
Cross-document reasoning → GraphRAG (comprehensive, more expensive)
Exploratory / "summarize everything" → LazyGraphRAG (cost-efficient global search)

This Adaptive RAG approach treats retrieval strategy as a runtime decision, not an architecture decision. You don't commit to one pipeline at build time. You let the question itself determine which retrieval path runs.

What to do this week

1. Audit your failure cases. Look at the questions your current RAG system handles poorly. If most failures involve cross-document reasoning, multi-hop questions, or "why" queries, you have a GraphRAG-shaped problem.

2. Start small. Don't index your entire corpus on day one. Pick a 100-page subset where you know cross-document questions matter. Run GraphRAG on that. Compare answer quality against your current pipeline. The cost for a small test is negligible.

3. Consider the hybrid path. Tools like Neo4j's graph store, LangChain's GraphRAG integrations, and Microsoft's own GraphRAG library all support running vector and graph retrieval in parallel. You don't have to rip out your existing pipeline.

4. Watch the cost-per-query ratio. For high-volume scenarios (customer support, internal knowledge bases), even a modest accuracy improvement compounds fast. For research scenarios (legal discovery, medical literature review), the accuracy gain can justify significantly higher per-query costs.

The question isn't whether GraphRAG is "better" than vector RAG. It's whether your users are asking questions that vector search structurally cannot answer. If they are, no amount of prompt tuning or chunk-size optimization will fix it.

This article is adapted from Knowledge Graphs in Practice: From Fundamentals to GraphRAG, covering the full pipeline from knowledge graph construction to production GraphRAG deployment -- including cost analysis, enterprise patterns, and code-as-graph applications.