Sandeep Pamarthi

Posted on Feb 24

Graph RAG: Architecture and Implementation of Knowledge-Graph-Augmented Generation

#rag #neo4j #ai #datascience

How replacing flat vector retrieval with structured graph traversal unlocks multi-hop reasoning in LLM applications

You’ve probably heard the promise: combine a large language model with your private documents, and it answers questions accurately, without hallucinating. This is Retrieval-Augmented Generation — RAG — and in its standard form it works well enough to ship to production.

But then someone asks: “Which of our vendors supply both Component A and Component B, and have had a compliance incident in the last 18 months?” The system struggles. The answer requires connecting information across multiple documents, through multiple relationship hops. Standard RAG — which retrieves semantically similar text chunks — has no mechanism for this.

This is the problem Graph RAG solves. Instead of a flat index of text embeddings, it builds a knowledge graph of entities and relationships extracted from your documents. Retrieval becomes graph traversal. The LLM receives structured, relationship-aware context rather than a pile of similar-looking sentences.

This article walks through how Graph RAG works from the ground up: knowledge graph construction, community detection, hybrid retrieval, and the generation layer — with concrete implementation detail at each stage.

Graph RAG is not simply RAG with a graph database bolted on. It is a fundamentally different theory of what “retrieval” means: finding connected knowledge rather than similar text.

Why Standard RAG Breaks on Relational Queries

To understand what Graph RAG fixes, it helps to be precise about how standard RAG fails. A vector RAG pipeline does three things: it splits documents into chunks, encodes each chunk as a dense embedding, and at query time retrieves the top-k chunks whose embeddings are closest to the query embedding.

The retrieval step is essentially asking: “which pieces of text look most like this question?” That works well for factual lookups — “what is the return policy?” — because the answer probably lives in a chunk that talks about returns. It breaks down in three distinct failure modes:

The relevant information is split across multiple chunks with no shared vocabulary to make them look similar to the same query.
The question requires chaining facts: A relates to B, and B relates to C, therefore A relates to C. No single chunk contains the full chain.
Multiple chunks contain contradictory information and the retriever has no mechanism to reconcile them — it just injects all of them into the prompt.
All three are fundamentally about the absence of relationship modeling. Text similarity is not the same as logical or factual connection. Graph RAG addresses this at the representation layer, before any retrieval logic runs.

The Knowledge Graph as a First-Class Citizen

The foundation of Graph RAG is a property graph — a data structure of typed nodes (entities) connected by typed, directed edges (relationships), each carrying arbitrary key-value properties.

For a document corpus about a pharmaceutical supply chain, the graph might contain nodes like:

Supplier(name, country, tier, certification_status)
Component(name, category, spec_version)
Incident(date, severity, type, resolved)
Contract(start_date, end_date, value)
And edges like:

(Supplier)-[:SUPPLIES]->(Component)
(Supplier)-[:SUBJECT_OF]->(Incident)
(Contract)-[:COVERS]->(Supplier)

The multi-hop query from the introduction — vendors supplying both Component A and B with recent incidents — now becomes a straightforward graph traversal: find nodes that have SUPPLIES edges to both targets AND SUBJECT_OF edges to Incidents within a date range. No amount of semantic similarity matching replicates this.

2.1 Entity Extraction

Building the graph from raw documents requires two extraction steps. First, Named Entity Recognition (NER) identifies entity mentions — spans of text that refer to a real-world object. Second, Relation Extraction (RE) identifies predicate relationships between entity pairs.

Modern approaches use LLMs for both steps, prompted with domain-specific schemas. A prompt for entity extraction might look like this:

SYSTEM: You are an information extraction system. Extract all entities

from the following text. For each entity, return:

- entity_text: the exact mention in the text

- entity_type: one of [SUPPLIER, COMPONENT, INCIDENT, CONTRACT, PERSON, DATE]

- canonical_id: a normalized identifier (e.g. supplier_acme_corp)

Return a JSON array. Extract only entities explicitly mentioned.

TEXT: {document_chunk}

Relation extraction uses a similar pattern, taking pairs of extracted entities and asking the model to identify the predicate relationship, if any, between them. The key design choice is whether to use a closed schema (a fixed set of allowed relationship types) or an open schema. Closed schemas produce cleaner, more consistent graphs; open schemas capture richer semantics at the cost of inconsistency.

2.2 Coreference Resolution

The same entity is often referred to in multiple ways across a document corpus: “Acme Corp”, “Acme”, “the supplier”, “they”. Without resolving these to a canonical identifier, the graph will fragment a single real-world entity into many disconnected nodes.

For domain-specific corpora, a practical approach combines fuzzy string matching (for company names with minor spelling variations) with an LLM-based alias resolution pass that groups mentions into canonical clusters. At scale, embedding-based entity linking against a reference vocabulary is more tractable.

Community Detection and Hierarchical Summarization

A large knowledge graph is not directly usable for retrieval. A graph with one million nodes and five million edges cannot be injected wholesale into a prompt. Microsoft Research’s GraphRAG paper introduced a critical structural innovation: community detection with LLM-generated summaries.

The idea is to cluster the graph into semantically coherent communities — groups of nodes that are densely connected to each other and more sparsely connected to the rest of the graph. Each community is then summarized by an LLM, producing a natural-language description of what that cluster represents.

3.1 The Leiden Algorithm

The Leiden algorithm is the standard choice for community detection in Graph RAG systems. It optimizes modularity — a measure of how much denser a community’s internal edges are compared to what you’d expect by chance — through iterative refinement. It improves on the older Louvain algorithm by guaranteeing that communities are internally connected.

Running Leiden on a Neo4j graph using the Graph Data Science library looks like this:

// Project a subgraph for community detection
CALL gds.graph.project(
'kg-graph',
['Entity'],
{ RELATES_TO: { orientation: 'UNDIRECTED' } }
);
// Run Leiden
CALL gds.leiden.write('kg-graph', {
writeProperty: 'communityId',
maxLevels: 10,
gamma: 1.0,
theta: 0.01
})
YIELD communityCount, modularity;

3.2 Multi-Level Summarization

Community detection produces a hierarchy: small tight-knit clusters at the leaf level, progressively larger and more abstract communities at higher levels. Each community at each level gets summarized by an LLM.

A leaf-level community covering a single supplier and its associated components and incidents might produce a summary like: “Acme Corp is a Tier 1 aluminium supplier with three compliance incidents between 2022 and 2024, all related to environmental reporting requirements.”

A higher-level community covering all Southeast Asian suppliers might summarize: “The Southeast Asian supplier cluster accounts for 34% of component volume and has a higher-than-average incident rate concentrated in Q1 2023, primarily linked to post-pandemic audit resumption.”

These summaries become the units of retrieval for global queries — questions that require synthesizing information across the whole corpus rather than from specific entities.

Hybrid Retrieval: Local and Global Query Modes

Graph RAG supports two fundamentally different retrieval strategies, and choosing between them is the first decision the query router must make.

4.1 Local Search

Local search is appropriate for queries that reference specific entities. The pipeline has three stages: entity linking, subgraph extraction, and context assembly.

Entity linking maps query mentions to canonical graph nodes. For the query “What compliance issues has Acme Corp had?”, the linker must identify that “Acme Corp” maps to the node with id supplier_acme_corp. This is done through a combination of exact match, fuzzy string similarity, and embedding-based nearest-neighbour search over entity name embeddings.

Once anchor entities are identified, subgraph extraction runs a traversal from those anchors. The traversal policy — depth limit, edge type filters, neighbour scoring — is configurable. A common implementation uses Personalized PageRank (PPR) with the anchor nodes as seeds, which naturally surfaces the most strongly connected neighbours without requiring a hard depth cutoff:

CALL gds.pageRank.stream('kg-graph', { maxIterations: 20, dampingFactor: 0.85, sourceNodes: [anchorNodeId] }) YIELD nodeId, score ORDER BY score DESC LIMIT 50 RETURN gds.util.asNode(nodeId).name AS entity, score;

The retrieved subgraph — nodes, edges, and their properties — is then linearized into a structured text representation and assembled with any relevant community summaries into the final context string.

4.2 Global Search

Global search handles queries that cannot be answered from a local neighbourhood — questions like “What are the systemic risks in our supply chain?” or “Summarize the major themes in Q3 incident reports.” These require synthesizing information across the entire corpus.

The global search strategy uses the community summary hierarchy directly. The query is matched against community summaries at an appropriate level of abstraction, and the top-ranked summaries are assembled as context. Because each summary was generated by an LLM from actual graph data, they are already in a format the generation model can reason over.

4.3 Query Routing

A simple but effective router uses a lightweight classifier (or a prompted LLM) to decide which strategy to apply. The heuristic: if the query contains explicit entity mentions that can be linked to graph nodes with high confidence, use local search. If the query uses aggregate or summary language (“overall”, “trends”, “across all”, “compare”), use global search. Ambiguous queries can run both in parallel and merge the results.

The Generation Layer

With context assembled from graph traversal, the generation step is structurally similar to standard RAG but with richer, more coherent input. The key differences in prompting are worth detailing.

Because graph context includes explicit relationships and entity properties rather than raw prose, the system prompt should instruct the model to reason over the provided relationships rather than just summarize text:

SYSTEM:
You are an analyst with access to a structured knowledge graph.
The context below contains entities (nodes) and their relationships (edges)
extracted from source documents.

When answering:
Reason explicitly over the relationships provided
Cite the specific entity names and relationship types you used
If the graph context is insufficient, say so - do not speculate
Distinguish between what is directly stated and what is inferred

GRAPH CONTEXT:
{linearized_subgraph}

COMMUNITY SUMMARY (if applicable):
{relevant_community_summaries}

QUESTION: {user_query}
The instruction to cite specific entity names and relationship types is critical for auditability. Every factual claim in the response can be traced back to a specific node or edge in the graph, which was in turn sourced from a specific document. This chain of provenance is a significant advantage of Graph RAG in regulated industries.

Implementation Stack and Trade-offs

A production Graph RAG system requires decisions across four infrastructure layers. Here is a concrete technology mapping for a mid-scale deployment:

Neo4j (property graph) or Amazon Neptune (managed). For RDF-native workloads, Stardog. For experimental or research scale, NetworkX in-memory. Graph storage:
Pinecone, Weaviate, or pgvector. Entity name embeddings are typically small enough that a local FAISS index suffices. Vector index (for entity linking):
GPT-4o or Claude for extraction quality; a smaller model (GPT-4o-mini, Gemini Flash) can handle high-volume summarization steps at lower cost. LLM backbone:
LangChain’s GraphCypherQAChain as a starting point; LlamaIndex’s KnowledgeGraphIndex for a more managed abstraction; custom pipelines using the Neo4j Python driver for production control. Orchestration:
The most consequential trade-off is between extraction quality and coverage. High-quality relation extraction requires careful prompting and often human validation — but it produces a graph that retrieval can actually trust. Aggressive automated extraction at scale produces graphs with noise that degrades retrieval precision. Start narrow and high quality, then expand coverage incrementally.

Benchmarks and When Graph RAG Wins

Microsoft Research’s original GraphRAG evaluation showed significant improvements over naive RAG on “sensemaking” queries — questions that require integrating information across a large corpus. On the MSMARCO and HotpotQA benchmarks, which include multi-hop reasoning tasks, Graph RAG systems consistently outperform flat vector retrieval by 15–30% on faithfulness and completeness metrics.

However, Graph RAG is not uniformly better. On single-hop factual lookups from well-structured documents, a well-tuned vector RAG pipeline with good chunking often matches or beats Graph RAG, with far lower infrastructure complexity. The break-even point depends on your query distribution.

The practical heuristic: if more than 20% of your real user queries require reasoning across multiple entities or documents, the investment in a knowledge graph is likely justified. If your queries are mostly lookup-style (“what does policy X say about Y?”), standard RAG with good chunking and reranking will serve you better.

Conclusion

Graph RAG changes what retrieval means. Instead of asking “which chunks look like this question?”, it asks “which connected pieces of knowledge are relevant to this question?” — and provides the relational structure needed to answer that second question correctly.

The architecture has real costs: knowledge graph construction requires investment in extraction pipelines, schema design, and ongoing maintenance. Community detection and summarization add infrastructure complexity. These are engineering problems with known solutions, not fundamental blockers.

For applications where multi-hop reasoning, entity-level context, and answer provenance matter — supply chain intelligence, biomedical research, legal analysis, financial due diligence — Graph RAG is not just a marginal improvement. It enables query types that are simply out of reach for flat vector retrieval.

The field is moving quickly. Temporal knowledge graphs, multimodal graph nodes, and agentic graph exploration are all active research directions. The core architectural pattern described here — extract, graph, traverse, generate — is stable and production-ready today.

DEV Community

Graph RAG: Architecture and Implementation of Knowledge-Graph-Augmented Generation

Top comments (0)